Perspective
The Okta Tenant Resilience Gap
Key Finding
Most identity tenants run without a backup, without point-in-time recovery, and without drift detection against a known-good baseline. The vendor is not going to fix that for you.
We see the same gap on most engagements. The identity platform runs in someone else's cloud — Okta, Entra ID, Auth0 — and the running assumption is that someone else is looking after the resilience too. They are not, at least not in the way the rest of the security estate expects.
What the major identity platforms give you natively: - Service-level uptime SLAs covering platform availability - Recent admin actions in the system log, with retention measured in tens of days - Limited ability to undo specific actions in the admin UI
What they typically do not give you: - A point-in-time tenant restore to last Tuesday's configuration - A diff between today's tenant and a known-good baseline - Forensic-grade audit history measured in years - The ability to surgically restore one policy, one group, or one user without touching the rest
The first time an organisation discovers this is usually under pressure. A misconfigured access policy locks out an executive group. A federation trust gets edited by the wrong contractor. An attacker with a stolen admin session deletes user accounts to slow down the response. A junior admin pastes the wrong group into a privileged role.
In each scenario, the question is the same: can we get back to a known-good state in minutes, or are we starting from yesterday's screenshots and forensic guesswork?
Three architectural patterns that close the gap:
Baseline as code. The intended tenant state — policies, groups, role bindings, federation trusts, lifecycle rules — held in version control, with drift detection running continuously against the live tenant. The drift report tells you what changed, by whom, and against your last approved baseline.
Point-in-time recovery, surgical. Continuous backup of the tenant state with the ability to restore one object — not the whole tenant — to its state at a specific timestamp. Whole-tenant restore is a sledgehammer; surgical restore is what you actually want in an incident.
Forensic audit history. Admin actions retained for years, not weeks, in a store outside the platform that holds them. This is what the supervisory authority will ask for, and what your insurer will ask for, and what the post-incident review will ask for.
We deliver this through Acsense as part of our identity-resilience practice. The tenant stays where it is; the resilience layer runs alongside it. The day you need it, you stop guessing.
Need help with your identity architecture?
Every incident on this page was preventable with the right architecture. Let's talk about yours.
Book a Conversation