How do you respond when something really ungood happens to your digital services?
How do you marshal your resources when you need to assemble a team to respond to an issue where every minute is costing you money? Or reputation? Or customers?
xMatters has the answer.
We're adding adaptive incident management to the already amazing list of offerings in our world-class digital service availability platform. We'll be introducing this whole new approach to incident response and automation in just a few short weeks, but we're so excited that we just had to show it off now.
Our take on incidents
We've been researching and evaluating the best practices of SRE, ITIL, v4, ICS, and elite IT organizations and found a need to better support their incident processes. Because when an incident occurs, you need to resolve it right now, as quickly as possible.
To respond to incidents faster during a crisis, you need to be able to:
- Prioritize incidents using age, status, severity, or a combination of factors.
- See at a glance what resources are available to fix the problem.
- Engage additional resources when required and dismiss the ones that aren’t needed anymore.
- Automate collaboration processes so resolvers know exactly how and where to communicate.
- Review detection and response metrics to reduce impact duration and time-to-engage.
- Evaluate workflows and revise resolution procedures to increase efficiency.
- Develop prevention measures to drive continuous improvement of your own process.
Find, evaluate, & collaborate
The Incidents list can help you find and prioritize the incidents that are important to you. The xMatters incident model lets you categorize incidents in a number of ways, including how old they are, what their current status is, what impact they’re having on your services, etc.
You can filter the list of incidents to quickly locate the oldest or find the ones with the most impact or, if you’re suffering a flood of incoming incidents, easily select and reject the duplicates.
When you find the incident that requires your attention, drill through to the Incident Console to view its details.
The Incident Console puts all the critical information you need in one place.
- Right at the top, the summary and description provide enough detail for anyone to understand and evaluate the impact.
- The Status, Severity, and Impact Duration show what stage the investigation is at, how bad the problem is from a customer perspective, and how long it’s been going on.
- In the Resolvers area, you can easily see which resolvers are available, who’s already working the problem, and where a response is still pending.
- The Roles area lets you see who reported the incident and who’s currently in charge.
- The Collaboration area lists all the channels that the incident response team is using to work through the problem. This means the associated chat channel or conference bridge is only a click away.
- Meanwhile, the Timeline area keeps a live, detailed record of everything that’s going on. Whether that’s a resolver accepting the request to engage, a newly assigned commander, or a change to the incident’s severity, the Timeline automatically records the who, what, when, and why of every change.
Automate & guide
Once an incident commander has an incident to focus on, they need to be able to immediately engage resources to help them identify and triage the incident.
As understanding of an incident’s scope and nature grows, the incident commander might decide they need additional help. Engaging those resources as quickly as possible is extremely important, so the common setup tasks associated with incident management (like creating conference bridges and chat channels, posting to status pages, etc.) need to be precise and immediate.
By automating those tasks, xMatters can do them practically instantaneously and eliminate the potential for manual errors – which are easy to make in the heat of battle. xMatters also automates the engagement process by using your on-call schedules to target the right resolvers on their preferred devices so you can stay focused on the problem at hand.
Evolve & improve
Incident management doesn’t end when an incident is resolved. You want to make sure you’re solving incidents faster over time, which means you need to:
- Normalize data to provide a consistent view of incidents
- Develop playbooks and response strategies to suit specific team and service needs
- Share strategies and workflows to promote consistency across processes
- Drive continuous improvement through comprehensive incident data and advanced analytics
The xMatters adaptive incident management model gives you a way to compare response and resolution information from one incident to the next so you can build metrics based on consistent data. We want to empower teams to create workflows and strategies that suit their needs, and allow them to develop efficiencies for faster, more accurate resolution. Once they’ve developed and enhanced these efficiencies, we want to enable them to share and promote their approach as best practices for other teams in their organization.
With adaptive incident management, your organization can progress along a maturity model for incident resolution that leads to service resilience and continuous improvement. Let us help you initiate your process towards incident management nirvana.
Availability (aka – So when can I get it?!)
Here are the dates you need to know:
- On August 31, 2020, we’ll be introducing the Incident Management Console in a limited release to Early Access Program participants who have asked to be a part of this process and to new customer instances.
If you’d like to have a chance to experience adaptive incident management in your EAP instance, contact your Customer Success Manager or reach out to Customer Support.
- For our upcoming Joust quarterly release (Oct/Nov 2020), we’ll be rolling out the Incident Management Console to our entire customer base.
If you want to stay up-to-date on adaptive incident management as we roll it out, be sure to click the Follow button at the top of this article!