Atlas ApexAtlasApex

Incident

Microsoft Entra ID July 2024 Outage: The IdP as Single Point of Failure

Back to Thinking
IncidentMicrosoft service health post-mortem · Jul 2024

Key Finding

Federation centralises trust. When the trust anchor is unavailable, every downstream relying party that depends on it is unavailable too. Identity resilience is not the IdP's uptime — it is the relying party's ability to function when the IdP is down.

Microsoft has published multiple Entra ID and Azure service-disruption post-mortems through 2024-2025. The July 2024 outage was one of the more visible because of its coincidence with the CrowdStrike Falcon content-update incident the same week — both events stressed the same identity recovery muscles in many enterprises and exposed how few had practised the scenario.

The pattern is by now well known: a regional or global Entra ID issue blocks new sign-ins; existing sessions continue working until they expire; relying parties that depend on token refresh or fresh authentication start to fail; conditional-access decisions stop being evaluated; and admin paths into the affected services use the same IdP, so the recovery actions an operations team would normally take are themselves blocked.

For identity-architecture practice the right framing is that the IdP is a critical resilience asset, not a SaaS commodity. The question to design against is not "what is the IdP's SLA" — it is "what does our organisation continue to do when the IdP is unavailable?" That includes break-glass paths that do not depend on the same IdP, BitLocker and admin-recovery key escrow that survives federation outage, and impact-tolerance modelling for identity itself under the operational-resilience regimes (DORA, BCBS principles) that increasingly demand it.

Need help with your identity architecture?

Every incident on this page was preventable with the right architecture. Let's talk about yours.

Book a Conversation