On May 28, 2024, at 10:45 AM CET, an incident impacting all users utilizing the Dynamics & Library system within the West Europe (Production 1) environment was detected. The issue caused the system to have degraded performance, causing the users to experience slow responses or even timeouts.
The engineering team quickly discovered that the degraded performance was caused by the SQL server being under a heavy load due to a reindexing operation. The reindexing operation was part of a migration process that the engineering team was rolling out. At 11:00 CET, as an immediate mitigation, the engineering team initiated the capacity increase of the SQL server. By 11:05 CET, the extra resources to the SQL server were successfully allocated. At this time, the application performance restored to normal parameters, and the application users were no longer impacted. By 12:37 CET, the incident was resolved after the engineering team successfully applied the migration and confirmed it was working as expected.
We are reviewing and enhancing our internal procedures for migrations to ensure that similar issues are prevented in the future.