Service degradation: Slow Access on West Europe (Production 1)

Incident Report for Templafy

Postmortem

On May 28, 2024, at 10:45 AM CET, an incident impacting all users utilizing the Dynamics & Library system within the West Europe (Production 1) environment was detected. The issue caused the system to have degraded performance, causing the users to experience slow responses or even timeouts.

The engineering team quickly discovered that the degraded performance was caused by the SQL server being under a heavy load due to a reindexing operation. The reindexing operation was part of a migration process that the engineering team was rolling out. At 11:00 CET, as an immediate mitigation, the engineering team initiated the capacity increase of the SQL server. By 11:05 CET, the extra resources to the SQL server were successfully allocated. At this time, the application performance restored to normal parameters, and the application users were no longer impacted. By 12:37 CET, the incident was resolved after the engineering team successfully applied the migration and confirmed it was working as expected.

We are reviewing and enhancing our internal procedures for migrations to ensure that similar issues are prevented in the future.

Posted May 29, 2024 - 14:34 CEST

Resolved

The incident has been resolved, and further information will be provided in a postmortem shortly.

We apologize for the impact to affected customers.

Posted May 28, 2024 - 12:37 CEST

Monitoring

The incident has been successfully mitigated, and our team is actively monitoring the situation to ensure ongoing stability and performance. We are observing the systems to prevent any further disruptions.

Posted May 28, 2024 - 11:05 CEST

Identified

We have identified an issue that affects a subset of customers and are working towards a resolution.
Further updates will be posted here soon.

Posted May 28, 2024 - 10:45 CEST

This incident affected: Templafy Hive (Library & Dynamics).