Incident Initiation:
On October 30, 2024, at 11:30 AM CET, an issue was introduced in the Hive environment, specifically affecting the login process for web add-ins. The issue was detected on October 31, 2024, at 10:00 PM CET, when users reported being redirected to an error page during login attempts. This disruption hindered users from accessing essential functionality, obstructing their daily work and resulting in a surge of support queries and a notable increase in error rates in system logs.
Investigation:
The engineering team initiated an immediate investigation at November 1, 2024, at 1:30 PM CET. They hypothesized that the issue was related to a recent deployment of a library update, which had potentially caused the authentication failure in web add-ins. Further investigation confirmed this hypothesis, identifying the update as the root cause of the login issues.
Mitigation and Resolution:
To mitigate the incident, the engineering team promptly reversed the library update that triggered the authentication failure. After deploying the fix on the testing environment, rigorous testing was conducted, which successfully resolved the issue. The team then proceeded to deploy the fix across all production clusters. By reverting the library update, the login process was restored, enabling users to access web add-ins normally.
Impact and Scope:
The incident impacted multiple tenants across various clusters, leading to significant workflow disruptions for affected users. The error affected all users attempting to log in to web add-ins during the incident window and increased support requests as a direct consequence.
Post-Incident Actions:
In response to this incident, the engineering team will implement several post-incident actions, including a detailed review of update processes to prevent future authentication-related disruptions. Additional procedural improvements will be made to monitor critical dependencies more closely, particularly within the authentication workflows.