GIG: xMatters and AWS

Between travel and coffee shops I rarely spend more than a few days in one place and it is difficult to always be asking network owners or coffee shop proprietors to open a port in their firewall just so I can send stuff into an Integration Agent or set up a quick web server to simulate ovens. Generally the barista gives me a blank look or just says "no". Fortunately, Amazon will give you an instance for like $15 a month. Nothing fancy or nearing the realm of work horse, but for my several times a month web requests, it is very handy. 

However, from time to time, I forget to shut down my Integration Agent and this uses CPU cycles to do nothing and we get charged like 30 cents extra. So I spent several hours to figure out how I could use xMatters to remind me that I left it running, and even provide a response to shut the agent down!

Follow along after the break for the technical adventures. 

 

 


CloudWatch is Amazon's monitoring service for the EC2 infrastructure. It has all kinds of handy metrics you can alert on such as CPU Utilization, Disk operations, Network activity, etc. These kinds of alerts can be very handy in keeping the costs of running all your VMs down which generally makes everyone happier. CloudWatch is the monitoring arm, but isn't terribly useful without another service from Amazon, the SNS (simple notification service) service. This is the service that will actually tell someone about an alarm. It has all kinds of things it can do, but we are most interested in its ability to make an HTTPS call. 

At a high level, CloudWatch will monitor the attributes we choose (in this case CPU usage) and tell SNS to do something. We'll configure SNS to then make the HTTP call over to the Integration Builder which will then create an event. A response is then picked up by an IA that then fires the shutdown command. 

So, to get started, we are actually going to start near the end. Import the the AWS - CloudWatch Communications Plan and dig out the integration url for Inbound from SNS and copy it some place handy. While we're here, add a user or group to the recipients section of the Long Running IA. (Or you could create a subscription panel and subscribe some people based on criteria from the alarm. For now we'll "hard code" the recipients.) Then head over to the SNS console Topics and hit Create Topic.

Give it a descriptive name and hit Create Topic. 

After the Topic is created, an ARN (Amazon Resource Name) is generated. We'll need this value for the subscription we are going to create next. 

Cruise over to the Subscriptions and click Create Subscription. In the dialog presented, paste in the ARN from the topic above, select HTTPS and dig out that URL we copied from the Inbound from SNS integration in xMatters and paste it into the URL. Clicking the Create Subscription button will create the subscription and then a message will be displayed stating a confirmation will be sent. This helps confirm you own the target url. We all remember the CatFacts fiasco. 

To confirm the Subscription, head over to xMatters and click the Activity Stream for the Inbound from SNS inbound integration. You should see a new entry there with a payload. In the body, look for the SubscribeURL element and copy the url in there. 

Paste it into a browser and you will see an XML response: (In theory we could add some logic to the script to take care of this for us. I think just a GET request would do it.)

After getting that XML, the Subscription will be updated in SNS from "PendingConfirmation" to having the ARN for the Subscription. 

Great! Progress! 

Now, we can reference the Topic ARN in a CloudWatch Alarm. So head over to CloudWatch and click Create Alarm. 

This part may differ from what you need to monitor, but since I'm looking to monitor my EC2 instance, I'll select EC2 from the drop down and I then get a list of metrics to monitor. I selected CPUCreditUsage and I see a nice graph of the CPU on the box:

This is actually rather informative. I shut down my IA manually around 18:45 (GMT) and you can see the huge spike, then the usage drops to almost nothing. Using the cursor I can see that when active, the IA is using about 0.03 CPU units. So, let's make an alarm that if the CPU usage is > 0.03 for 12 hours activate the SNS Topic which will then send it over to xMatters. Click Next to get to the Alarm Definition page. 

On this page, give the Alarm a name and a helpful description. The "periods" are defined in the lower right. So I have 12 consecutive periods of 1 hour. The Actions section is where you will enter the ARN for the topic we created above. Note that you will need to click the "Enter List" to be able to paste in the ARN. Then click Create Alarm. 

 

Great! Over here I have one pre-baked and you can see the Alarm turns red when the conditions are met.

 

xMatters Configuration

This part is optional. If you are just interested in getting alerts out of CloudWatch then you can skip over this section and get right to the Testing below. 

We'll register the "cloudwatch" integration service presented by the IA, so head over to the Developer tab > Event Domains area and click on the entry called "applications". You might remember in our Second Look at Callbacks, that event domains are where we tell xMatters about a particular integration service. The IA will then login and tell xM it can service these integration callbacks. 

So, again, in the "applications" event domain, scroll down to integration services and click Add New. Populate the name with "cloudwatch" (all lowercase) and leave the rest blank and hit Save. 

Then, go ahead and download the cloudwatch.zip file below and unzip it to IAHOME/integrationservices. Open up the IAHOME/conf/IAConfig.xml and scroll down to the service-configs element. Add this line inside that tag:

<path>cloudwatch/cloudwatch.xml</path>

Save the file and restart the Integration Agent. 

 

Testing

Well, the obvious thing to do to test this is to fire up the IA and wait 12 hours. .... Or we could use the handy aws command line interface (CLI). In particular, we'll check out the set-alarm-state command. If you run it and get some errors about authentication, see here for getting set up. So, I put this command together and fired it off:

aws cloudwatch set-alarm-state --alarm-name "IA Probably Running" --state-reason "Stuff happened" --state-value ALARM

Which will trigger a call into xMatters which should trigger the following email:

Then, clicking the terminate link will trigger the IA to do something, in this case run a command to stop the IA process. 

 

Gory Technical Details for the IA

The code on the IA side is pretty straight forward. I just stole the execute function from a previous post and updated the command to fire. The whole script looks like this:

importPackage(java.lang);
importPackage(java.io);

load("lib/integrationservices/javascript/event.js");
load("lib/integrationservices/javascript/xmio.js" );

function apia_callback( msg ) {

if( msg.xmatters_callback_type == 'response' && msg.response.toLowerCase() == 'terminate' ) {

cmds = [ "pwd" ];
stuff = execute( cmds );
IALOG.debug( 'PWD: ' + stuff );


cmds = [ "/etc/integrationagent-5.1.6/bin/stop_daemon.sh", "&" ];

IALOG.debug( "Command: " + cmds );

var output = execute( cmds, true );
IALOG.debug( "Output: " + output );

}
}

/*
* execute
* Executes the specified command using a ProcessBuilder
* ProcessBuilder handles |'s better than just Runtime.exec
*
* cmdArr - array of commands to fire. Ex:
* ["/bin/sh", "-c", "ls -l | grep stuff"]
*/
function execute( cmdArr, nowait )
{
// Start the process.
b = new ProcessBuilder( cmdArr );
process = b.start();

if( nowait )
return

// Wait for the process to complete.
process.waitFor();

// Handle the normal and error cases.
if (process.exitValue() === 0)
{
return copyToString(process.getInputStream());
}
else
{
throw new RuntimeException("The command '" + cmdArr.join( ' ' ) +
"' failed with exit value " + process.exitValue() + ". " +
"Output: " + copyToString(process.getInputStream() ) );
}
}


/**
* This method returns the contents of the specified input stream as a string.
*
* @throws IOException - If an I/O error occurs
* @throws NullPointerException - If inputstream is null
*
* @return a non-null String
*/
function copyToString(inputstream)
{
// Store output in a string buffer.
var buf = new StringWriter();
var writer = new BufferedWriter(buf);

// Always close reader before returning.
var reader = new BufferedReader(new InputStreamReader(inputstream));
try
{
// Copy one line at a time.
var line = reader.readLine();
while (line != null)
{
writer.write(line);
writer.newLine();
line = reader.readLine();
}

// Return the output.
writer.flush();
return buf.toString();

}
finally
{
reader.close();
}
}

 

And that's about it. I hope this session was helpful and instructive. I think there could be some improvement in determining who to notify rather than hard coding. I noted above that you could create a subscription panel but you could also put the recipients in the Alarm name, or maybe even create a custom field on the user record in xMatters and do a lookup. 

 

 

 

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk