Maintenance notice: hosting service improvements

Updated April 17, 2023 18:14

Don Clark

In the interests of always improving our customer experience, it's almost time to start making those improvements to our hosting services.

What are we doing?

This work is intended to make sure our infrastructure as a service provider stays up-to-date with the current technologies, and keeps our service robust and healthy. For a full explanation of what we're doing and why, refer to this article: Improving our hosting services.

When are we doing it?

Our customers in the EMEA and APAC regions are already enjoying the benefits of these changes, and our original schedule had our North American customers joining them in late January.

After much discussion and consultation with our customers, Engineering team, Support staff, and management, we've decided to fast-track the schedule, and move up the maintenance by a few weeks.

We're doing this for a number of reasons:

First and foremost, you asked for it! We recently had a service disruption in one of our North American data centers that had an unfortunately wide-ranging impact. This issue would simply not have occurred if the hosting service had already been updated. For the sake of those customers (and our hard-working Engineering and Support teams), we felt the time was right to shorten the timeline and make these changes ASAP.
We want to be able to get features in front of customers faster. There are several new and upcoming features that rely on the technology available in the new hosting service, such as advanced processing of targeted recipient failures and some extra functionality for the new All Events report. These features either behave differently for North American customers than they do for customers in other regions, or they are only available in their entirety in the upgraded regions.
You also asked for more time to test the changes in your non-production environments. When we performed these updates in other regions, the feedback we received was that the timeline didn't leave a lot of time between the updates to non-production and production systems. Some customers felt that they wanted more time to test things like IP filtering, email delivery, and other components that could be affected. This will provide more time to test these things, during what is traditionally a slower-traffic time of the year.
We've sharpened our process and we feel we're ready to take the next step. We've already performed these upgrades successfully in other regions, and we feel that we've gained enough necessary knowledge and practical experience to make the final changes right away.

Here is the new schedule for this maintenance:

Region	Non-Production	Production
Europe, Middle East, and Africa - Completed!	May 8-11, 2018	May 22-25, 2018
Asia Pacific - Completed!	October 16-18, 2018	October 23-25, 2018
Americas - Completed!	December 11-14, 2018*	December 17-21, 2018** January 7-11, 2019 January 14-18, 2019

* Except customers with legacy integrations or certain requirements.
** Applies to a specific set of migration-ready customers only.

Note: Some customers with unique requirements or concerns will be upgraded on their originally scheduled dates.

How will we communicate with you?

While our process will be similar to the recent database updates we performed, we learned a few things during the process and we're taking your feedback to heart. We'll be modifying our process slightly to try and help make things even smoother and hassle-free.

Well ahead of time, we'll provide you with a specific maintenance window during which you can expect us to implement the changes for your instance.

Then, we will notify you again for your specific downtime - both when your service is about to go down, and again once the maintenance has been completed and your service has been restored.

Also, to help reduce confusion, we won't be posting a maintenance notice on the xMatters status page each day. (That way you won't get reminded about a maintenance window in your region that may or may not affect you.)

How will this affect you?

Within your maintenance window, you may experience up to a maximum of 30 minutes where the service is unavailable. (Most customers will experience significantly less.) During this time, notification and response processing is not guaranteed.

Please also note the following:

Four hours before the start of your maintenance window, we will pause all EPIC synchronization, user upload, voice recording, and archive/purge processes. (If you attempt to initiate any of these processes within this time or during the maintenance window, your changes may not be persisted.)
Events in progress at the time of the maintenance may be disrupted, meaning that notifications and responses may not be processed and the events may be delayed or lost.
Scheduled triggers or messaging will be delayed until the service is restored.
The IP ranges that xMatters uses will change once the maintenance on your instance is completed. If you are using IP filtering, you will need to update the IP ranges in your systems or you may experience further service disruptions. Customer Support can provide you the complete list of IP ranges prior to the maintenance. For more information about IP filtering and why we don't actually recommend it, see this article: xMatters IP ranges.

What will happen to your incoming events?

If an event enters the system during that brief moment your instance is offline for maintenance, one of two things will happen:

If it comes in through a legacy integration (using an APXML request via the Integration Agent), the event will simply be placed in the queue until the heartbeat between the Integration Agent and xMatters is restored. When the connection is reestablished, the Integration Agent will submit the events in the same order as they entered the system.
If it's an Integration Agent event targeting a communication plan via REST, the request will return a 503 error (see example below) that we've set up specifically for this maintenance, instructing it to retry every ten minutes, up to three times. If another event enters the system after this, xMatters will place it in the queue for processing behind the first event, and then process them in order once the retry is successful. (This does mean that xMatters might not process the original event immediately when the service comes back online, depending on the exact timing between the 503 error, the retry interval, and the restoration of the service.)

How do you get help?

If you have any questions about this maintenance, use our online form to submit a request to xMatters Customer Support. Not only are our support agents always available, but they will always have access to the most up-to-date information and updates.

Comments

1 comment

Article is closed for comments.

Don Clark

December 11, 2018 19:06

Updated with our new schedule.

December 11, 2018 19:06

0