Experts agree, change is the most common cause of incidents – the Google SRE Handbook says straight out: "Recent changes to a system can be a productive place to start identifying what's going wrong." Consider the following:
- "85% of performance incidents can be traced to changes." – Will Cappelli, Gartner
- "Roughly 70% of outages are due to changes in a live system." – Site Reliability Engineering, O’Reilly
Those are big numbers! Clearly, visibility into recent changes is a key element of the service intelligence capabilities incident resolvers require to ensure your digital services continuously provide value to your customers.
Introducing change intelligence
Because digital services can experience thousands of changes per day, it’s critical to intelligently surface change information in a way that’s meaningful and actionable for resolvers.
With the latest enhancements to our digital operations platform, change information is embedded directly into the service intelligence capabilities of the xMatters incident management solution. By presenting relevant changes within the context of an incident, resolvers can identify recently changed services, gain greater insight into potential root causes, and immediately take action to mitigate and resolve the issue.
Identify what's changed
During an incident, resolvers can pivot from the Incident Console to the service dependencies map to view the extent of the impact across their service landscape. We’ve enhanced this view with visibility into recent changes to services impacted by the incident, as well as the services they depend on.
Use the ‘Highlight Recent Changes’ check box and timeframe selector to set the change window. Any services with changes in that period will be highlighted with a blue badge on the map:
In the example above, although the Proxy service is identified as a potential root cause for the incident, this service depends on Customer Quota – which, although it doesn’t appear broken, has been recently changed.
Identify how it was changed
To see detailed change data for a service, click the options menu for a service and select ‘View Changes’:
This opens a panel below the map with change records for the specific service, including:
- The time when each change occurred, normalized to your time zone.
- A summary of the change.
- The type of change: Source Code, Deployment, Orchestration, Feature Toggle, Manual, or Other.
- The source of the change.
- An external change ID, if applicable.
- Who or what made the change.
You can expand the panel to full screen, and click on any record in the list to view additional details that may have been provided about the change, in JSON, plain text, or another format:
Based on the change records in the example above, we’ve determined new code was deployed into production and there was a feature toggle change to Customer Quota. Even though the Proxy service is identified as a potential root cause of the incident, our insights into recent changes to our service landscape indicate that changes to Customer Quota may be the cause of the incident.
Take action
After you’ve identified the likely cause of the issue, you can take action directly from the service map by engaging subject matter experts as incident resolvers or by running automations that may be available for specific services.
In our example, we can click the Customer Quota service and select 'Notify to Engage' to pull in people from the team that owns that service as incident resolvers:
We can also select to run a rolling restart or regional failover from the Proxy service's list of available automations:
How do I get change data into xMatters?
You’ll be able to feed change records into xMatters in several ways:
- By adding the ‘Add Change Record’ step to your flows in Flow Designer.
- Using one of our new low-code workflows or Flow Designer triggers designed to feed in changes from specific products, starting with GitLab, Azure Pipelines, BitBucket, GitHub, and LaunchDarkly.
- Directly via the xMatters REST API.
- Manually through the web user interface.
Availability
Change intelligence is included in our upcoming Pole Position Release:
- Early Access Program: TBD
- Non-production environments: Tuesday, June 21
- Production environments: Tuesday, July 12
While change intelligence features in the Incident Console are available for Base and Advanced plans only, customers from all plan levels will have the ability to add change records to xMatters and browse service changes in the Service Catalog:
3 Comments