We’ve all been there… making great progress getting things working... click click here, click click there, and you’re basking in your mastery of the computer code.
But then we change something and it all breaks down….
So what now? How do we get back on our feet and get back to cruising through code and configuration? Well, this post will address what to do when things go awry. We’ll go over where to start looking if something doesn’t go right and some common issues that everyone runs into.
First, I’m going to start with an analogy - Integrations are like water flowing through a system of pipes. There is a source that the water initiates from, it travels from this pipe to that pipe, and comes out the other side. In our case, all pipes lead to xMatters! Er, except when they lead away from xMatters, like when we have callbacks - in which case, xMatters is the source and the water goes out this pipe to that pipe and comes out at the other side. But the concept is the same.
But as often happens in the process of developing an integration, the water doesn’t come out the other side. In a pipe system the water isn’t coming out the other side because A) it is going out a pipe we aren’t expecting, or B) because it is blocked.
In our first, rather simple case of water/data going out a pipe we aren’t expecting, this is generally because we gave it the wrong URL endpoint and a
quick thorough look at the configuration on the source system can help there. In some cases there might be a proxy server set up by those sneaky network admin people. As I understand, those people like beer. So a beer and a friendly email might help resolve a proxy mix-up situation.
The second case is more complicated. A blocked pipeline can cause a mess, but fortunately in computer code, unlike actual pipes, it typically doesn’t assault the senses! As I build integrations, this is always something I run into. I’ll fire off some code and expect to see the event in xMatters. I wait and wait, watching the end of the pipe for the tell-tale signs of a live event. My hopes fall as time goes on. Fortunately, there is a straightforward way to tackle troubleshooting this situation…. you just peek into the pipe along the way. As you inspect sequential slices of the pipe, you can isolate where the blockage is.
Let’s take a concrete example to better visualize. Take the diagram below, displaying pipeline of the ServiceNow integration.
An Incident update triggers a Business Rule (1), which then calls a Script Include (2). From there, the workflow makes the jump via a REST Web Service call to xMatters (3) and if successful, the event will show up in the Event Activity Report (4). Regardless of success, an entry will be posted to the Web Services Audit Report (5) with an indication of the HTTP status.
Starting at the source, ServiceNow, we can add some log statements to either (or both!) the Business Rule and the Script Include. I often just use the phrase “See me? Business Rule 1” or something to indicate it got there. Occasionally, throwing in an important variable or two can be helpful as well. When we inspect the System Log, if we see our statement from (1), but not (2), then we know the blockage in the pipe is after our log statement in (1), but before our log statement in (2). We can then add a few more statements after the initial statement in (1) to help track it down.
Step 3 is where the pipe jumps from one application to the other and if we see both our statements from 1 and 2 in the System Log, then we should check the xMatters side to see if anything happened there. The first place to check is the Event Activity Report. This is the first thing you see on the reports tab:
It is helpfully sorted so the newest stuff is on top. Hmmm, looks like our last activity was 1 day ago. So no event here. This report only shows active events, but there is actually a lot that goes on before the “pipe” reaches here. The next place to look is the Web Service Audit Report. This is on the Reports tab and at the bottom of the menu.
Here we can see a couple of 404 Not Found errors. This means SN successfully tried to send the event to xMatters, but xM rejected it because the endpoint was not found. If, on the other hand, we didn’t see anything here and we are working with an on premises application such as Splunk, we might have a chat with our network engineering team or go back to the configuration section and triple, quadruple check the Web Services endpoint URL.
Getting familiar with the HTTP status codes is a good idea. For now, just remember anything that starts with a 2 is good, everything else is bad. In the report above, you can see a
200 OK from a ServiceNow User and a
400 Bad Request also from panda. All of these are event trigger REST calls and you can see the path to the Communication Plan (fka Relevance Engine, RIP). The other two, from this
TD_IAUSER guy are actually SOAP web service calls. You can tell because the Method Name is one of the Data SOAP web service calls, FindPersons in this case. Unfortunately, this report doesn’t have the details of the response code, so we’d have to dig that out of the source system.
Building integrations can be complicated. But the best thing to keep in mind that an integration is just moving data from one place to another through a pipe. By adding your own windows, you can peer through each one in order and make sure the data isn’t blocked and appears as expected.
In future posts, we’ll explore how to debug individual integrations. So if you have a request, throw it in the comments and I’ll add it to the list!