Jonatan Buus
We're currently using Grafana to trigger alarms via xMatters whenever an alert is triggered. Grafana will automatically send out another notification whenever an alert goes into repair.
Is it possible to automatically acknowledge stop the previous alarm and prevent further escalation in xMatter when receiving a repair notification from Grafana?
0
Comments
Please sign in to leave a comment.
Absolutely, you could write something in the integration builder or you could use flow designer. It would probably be quicker to test in out flow designer, so i'd recommend starting there. You'd want to use the get event and terminate event steps to do this; e.g. have your flow identify repair notifications and have it find and terminate the events in question.
More info about these flow steps is available here:
https://help.xmatters.com/ondemand/xmodwelcome/flowdesigner/flow-tools.htm?cshid=FlowFindEventsStep#GetEvents
Thanks Travis,
You're right, we haven't migrated to the Flow Designer yet so are still using the Integration Builder scripts.
Most of your examples it makes sense, however I'm a bit uncertain what I should pass to the below part:
I logged an alert from Grafana with the below data:
title: [Alerting] Test Alert
state: ok
ruleId: 52
ruleName: Test Alert
Based on the above, I guess it'd be something like the following:
Yep, looks about right.
I'd say just before you test it, throw this line in just to make sure.
Happy Coding!
We have implemented Travis' suggestion and trialing automatically terminating alarms when a repair is received from Grafana. Unfortunately it appears that alarms still get escalated even though a repair has been received.
In the log extracts below the following important events take place:
The notifications in 3) & 4) should not have been sent as the alarm should automatically have been terminated upon receiving the repair from Grafana in 2).
Log entries from the Activity Stream for the Grafana Integration
> GET https://cellpointmobile.xmatters.com/api/xm/1/events?status=ACTIVE&propertyName=ruleId&propertyValue=31 HTTP/1.1
> Accept: text/plain, application/json, application/cbor, application/*+json, */*
> User-Agent: Xerus (EndpointClient)
> X-Trace: fad7ef78-02f3-cb9f-6ae2-dda6b362be45,6f3dfe19-9168-4a96-adf6-640667e7318c,e0ff9397-5edd-4400-b286-93c3bc0cc5b4
> Content-Type: application/json; charset=UTF-8
> Authorization: Bearer ********
> X-Integration-UUID: 52f54293-6f69-4d03-badb-ae5f145ac1a7
> X-Flow-Trace: 52f54293-6f69-4d03-badb-ae5f145ac1a7:1579519236559
> Content-Length: 0
< HTTP/1.1 200 OK OK
< date: Mon, 20 Jan 2020 11:20:41 GMT
< x-trace: fad7ef78-02f3-cb9f-6ae2-dda6b362be45,6f3dfe19-9168-4a96-adf6-640667e7318c,e0ff9397-5edd-4400-b286-93c3bc0cc5b4,8f997c54-3418-4bca-87ed-60414904c157
< x-application-context: application:overrides
< content-type: application/json;charset=utf-8
< x-xss-protection: 1; mode=block
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: 0
< x-envoy-upstream-service-time: 104
< server: envoy
< transfer-encoding: chunked
< X-Robots-Tag: noindex
< X-FRAME-OPTIONS: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload;
< X-Content-Type-Options: nosniff
< Via: 1.1 google
< Alt-Svc: clear
{"count":0,"total":0,"data":[],"links":{"self":"/api/xm/1/events?limit=100&propertyValue=31&offset=0&propertyName=ruleId&status=ACTIVE"}}
> GET https://cellpointmobile.xmatters.com/api/xm/1/events?status=SUSPENDED&propertyName=ruleId&propertyValue=31 HTTP/1.1
> Accept: text/plain, application/json, application/cbor, application/*+json, */*
> User-Agent: Xerus (EndpointClient)
> Content-Type: application/json; charset=UTF-8
> Authorization: Bearer ********
> X-Integration-UUID: 52f54293-6f69-4d03-badb-ae5f145ac1a7
> X-Trace: fad7ef78-02f3-cb9f-6ae2-dda6b362be45,6f3dfe19-9168-4a96-adf6-640667e7318c,5bdbba4d-4b80-4d42-a972-b0afc6b4a459
> X-Flow-Trace: 52f54293-6f69-4d03-badb-ae5f145ac1a7:1579519236559
> Content-Length: 0
< HTTP/1.1 200 OK OK
< date: Mon, 20 Jan 2020 11:20:41 GMT
< x-trace: fad7ef78-02f3-cb9f-6ae2-dda6b362be45,6f3dfe19-9168-4a96-adf6-640667e7318c,5bdbba4d-4b80-4d42-a972-b0afc6b4a459,9edeffd5-9139-4615-9ca2-2f476ffa9a28
< x-application-context: application:overrides
< content-type: application/json;charset=utf-8
< x-xss-protection: 1; mode=block
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: 0
< x-envoy-upstream-service-time: 62
< server: envoy
< transfer-encoding: chunked
< X-Robots-Tag: noindex
< X-FRAME-OPTIONS: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload;
< X-Content-Type-Options: nosniff
< Via: 1.1 google
< Alt-Svc: clear
{"count":0,"total":0,"data":[],"links":{"self":"/api/xm/1/events?limit=100&propertyValue=31&offset=0&propertyName=ruleId&status=SUSPENDED"}}
Terminating event(s) for Rule: ELB - Active Connections(31) due to State: "ok" from Grafana with Result: true
Log extract from the Event log for the alarm
Jan 20, 2020 12:13:46.988 CET Device Android Push User A - Android Phone will be notified
Jan 20, 2020 12:13:46.989 CET Device Email User A - Work Email will be notified
Jan 20, 2020 12:13:46.990 CET Device Voice User A - Work Phone will be notified
Jan 20, 2020 12:13:46.990 CET Device Email User A - Work Email is Active
Jan 20, 2020 12:13:47.229 CET Device Email User A - Work Email - Notification will be processed by (x)Matters Trial SMTP Email PP
Jan 20, 2020 12:13:47.230 CET Device Voice User A - Work Phone is Active
Jan 20, 2020 12:13:47.254 CET Device Voice User A - Work Phone - Alternate Protocol Providers are unavailable for this device.
Jan 20, 2020 12:13:47.254 CET Device Voice Service Provider (x)matters Voice Gateway does not have a valid and enabled Protocol Provider
Jan 20, 2020 12:13:47.255 CET Device Android Push User A - Android Phone is Active
Jan 20, 2020 12:13:47.367 CET Device Android Push User A - Android Phone - Notification will be processed by (x)Matters GCM PP
Jan 20, 2020 12:13:47.447 CET Device Email User A - Work Email - Notification delivered
Jan 20, 2020 12:13:47.488 CET Device Android Push User A - Android Phone - Notification delivered
Jan 20, 2020 12:23:47.275 CET User TOPS - Weekly 24/7 Shift - member User B (user.b) - escalation has occurred to the next level
Jan 20, 2020 12:23:47.396 CET Device Email User B - Work Email will be notified
Jan 20, 2020 12:23:47.397 CET Device Android Push User B - Android Phone will be notified
Jan 20, 2020 12:23:47.400 CET Device Android Push User B - Android Phone is Active
Jan 20, 2020 12:23:47.647 CET Device Android Push User B - Android Phone - Notification will be processed by (x)Matters GCM PP
Jan 20, 2020 12:23:47.648 CET Device Email User B - Work Email is Active
Jan 20, 2020 12:23:47.713 CET Device Email User B - Work Email - Notification will be processed by (x)Matters Trial SMTP Email PP
Jan 20, 2020 12:23:47.867 CET Device Email User B - Work Email - Notification delivered
Jan 20, 2020 12:23:47.892 CET User TOPS - Weekly 24/7 Shift - member User C (user.c) - escalation has occurred to the next level
Jan 20, 2020 12:23:47.906 CET Device Android Push User B - Android Phone - Notification delivered
Hey Jonatan! Glad to hear you're making progress!
For events in situation 3 & 4, what is the "ruleId" property value? The terminate shows that it didn't find any events with a "ruleId" of "31"
{"count":0,"total":0,"data":[],"links":{"self":"/api/xm/1/events?limit=100&propertyValue=31&offset=0&propertyName=ruleId&status=ACTIVE"}}
Is that value populated in those events and if so, what is the value?
Thanks for the analysis Travis, the ruleId should be populated for all events?
Here's the complete script we use, please note the lines marked with bold:
// Instantiate the library for use
var util = require('Util');
var payload = JSON.parse(request.body); //Use this if the payload is a JSON object in the request body
/** Customize form properties **/
trigger.properties.Title = payload.title;
trigger.properties.State = payload.state;
trigger.properties.ruleId = payload.ruleId;
trigger.properties.ruleName = payload.ruleName;
trigger.properties.ruleUrl = payload.ruleUrl;
// Terminate the existing events for Rule
if (trigger.properties.State == 'ok')
{
var result = util.terminateEvents({'ruleId': trigger.properties.ruleId});
console.log('Terminating event(s) for Rule: '+ trigger.properties.ruleName +'('+ trigger.properties.ruleId +') due to State: "'+ trigger.properties.State +'" from Grafana with Result: '+ result);
}
// Discard event for [No Data] notifications sent from Grafana
else if (!payload.evalMatches || payload.evalMatches.length === 0)
{
console.log('Title: '+ trigger.properties.Title);
console.log('State: '+ trigger.properties.State);
console.log('Rule ID: '+ trigger.properties.ruleId);
console.log('Rule Name: '+ trigger.properties.ruleName);
console.log('Rule URL: '+ trigger.properties.ruleUrl);
console.log('Ignoring this trigger due to evalMatches being empty');
}
// Create new Event
else
{
// Set the event priority:
trigger.priority = "High";
trigger.properties.Message = payload.message +'\n'+ 'Value: ' + payload.evalMatches[0].value + ' ' + 'Metric: ' + payload.evalMatches[0].metric;
form.post(trigger);
}
Please let me know if I'm missing something in the above, which prevents the event from being found using the ruleId?
Also note that the following imports the script you previously provided in this thread per your suggestions:
// Instantiate the library for use
var util = require('Util');
Ah! I bet it is our search that's the problem. Instead of passing just "ruleId", pass the #en. Like so:
Give that a shot and let me know how it goes.
Thanks for the reply Travis, have updated the script per your suggestion.
However, it seems counter intuitive that the ruleId property is localized and thus have to be postfixed with #en in the lookup. And ID in my mind is never localized as it's purpose is to give a unique reference.
Are there any grand thoughts behind this approach that would be useful to understand?
It's actually all the properties that are localized. The API call to search for events based on property requires the name to be localized. Which makes sense because if you are localizing the properties you need to search the right one.
However, it seems like setting a default might be nice so that if you aren't passing the localization, then it is assumed. Especially for instances that only use one language.
I'll see about getting an enhancement request opened for that.
I hope that helps.
Happy Friday!