Downtime
On February 17th 2020, Templafy was experiencing downtime.
Templafy was experiencing intermittent access problems which subsequently affected 2-3% of requests to the Templafy platform.
To explain further what happened, please see the below note.
Summary of the incident on February 17th 2020
During the incident, some users would receive an error message when attempting to access the solution. The error messages shown included error codes 502, 503 and 403 as well as “The service is unavailable”.
Root cause
The root of the problem proved to be related to the scheduled maintenance on Saturday, February 15th where new IP addresses were added to our app service.
As a part of our maintenance we upgraded our service plan and added a CNAME DNS record as recommended by Microsoft in order to mitigate performance instability caused by a bug found in Microsoft Azure’s courtesy warm-up when scaling.
With this upgrade, our inbound and outbound IP addresses changed automatically, and the IP address configured in our DNS failed to respond on 2-3% of all requests.
Our availability monitors didn’t report the failed requests after the scheduled maintenance on Saturday February 15th because of the newly added CNAME DNS record as per Microsoft’s request.
Resolution
To solve the issue, we downgraded and upgraded our Azure service plan to have new IP addresses automatically allocated to our app service. This allocation took 20 minutes to reflect in Azure after which we updated our DNS. This DNS change took one hour to fully propagate to all users.