OpenAPI schema drift: what it is, why it happens, and how to stop it

Schema drift is the gap between what your OpenAPI document says your API does and what your API actually does. The document claims `GET /users/:id` returns a `User` with `email: string`. The server, three deploys ago, started returning `email: string | null`. Nothing in your CI objects. Your generated TypeScript SDK still types `email` as `string`. Six weeks later a customer's UI renders `Hello, undefined` and opens a ticket.

The frustrating property of drift is that it never announces itself. The code compiles, the tests pass, the build is green. The types lie quietly. Every team that generates clients from an OpenAPI document eventually meets this failure mode — usually the first time a field changes shape, and almost never during a planned migration.

This guide covers what drift actually is (three flavours, not one), why disciplined teams still ship it, the manual fixes people try first, and the automated fix that actually holds. The last section explains how SDK Factory implements it — if you're building the same thing yourself, the mechanics are reusable.

Three kinds of drift, not one

Conversationally we say "the SDK drifted." Operationally there are three separate drifts and they have different causes and different fixes.

Spec-drift is when the OpenAPI document stops reflecting the server. The developer changed the server but not the spec — usually because the spec is hand-maintained as a side artifact. This is the most common flavour and the hardest to catch without behavioural testing: the spec can be wrong for months.

Server-drift is the inverse: the OpenAPI document is right, the server is wrong. Someone landed a change without touching the spec's documented behaviour, and now the server returns shapes the spec forbids. Contract tests catch this; nothing in the SDK world does.

SDK-drift is when the OpenAPI document is correct, the server matches it, but the generated SDK is older than both — because nobody regenerated it after the last spec change. The type safety is real; it's just referring to the wrong version of reality.

Why disciplined teams still ship drift

The usual pattern is: codegen is wired into CI, the first run goes fine, and for the next year the regeneration is "manual when we remember." Nobody sets up a scheduled regeneration because the generated diff requires review, and the review is tedious, and the diff is almost always uncontroversial — which means the review feels like pure ceremony, which means it drops off the calendar.

Even teams who do schedule regeneration commonly miss small-interval drift. A spec change at 10 a.m., a regeneration cron at midnight: fourteen hours where consumers get stale types. That gap is harmless for most changes and catastrophic for a handful (a removed endpoint, a renamed field, a required parameter becoming optional and changing meaning).

The structural problem is that schema updates are a push event (the spec changed) and regeneration is a pull process (a cron wakes up and re-runs). The feedback loop is too long for anything fast-moving, and the solution is not "a tighter cron" — it's removing the loop.

Manual fixes that almost work

Schema-first discipline: mandate that every server change starts with a spec change, code review the spec diff, generate the server stubs from the spec. This is genuinely good practice and catches a lot of drift at the source. It doesn't catch the case where the spec change is itself wrong — the reviewer said LGTM on a shape nobody verified — and it doesn't help SDK-drift at all.

Scheduled regeneration: a cron or GitHub Action runs the generator every N hours, opens a PR, waits for review. Works well for teams with a slow-moving spec and a culture of merging green PRs fast. Falls apart under spec churn (the queue piles up) and under incident pressure (nobody reviews an SDK PR while firefighting).

Schema contract tests: run a test suite against the real server that verifies each documented response shape matches reality. Catches spec-drift and server-drift in CI. Doesn't help SDK-drift. Requires test infrastructure and stays honest only if new endpoints are added to the suite immediately.

Each of these moves the problem one step and leaves residue. The residue is specifically the SDK-drift flavour — even a team with perfect schema discipline and a perfect contract suite can have a stale published SDK if nobody re-ran the generator after last Tuesday's spec change.

The automated fix: polling, canonical hashing, auto-rebuild

The loop that actually closes the SDK-drift gap is: fetch the spec on a short cycle, normalise it, hash it, compare to the previous hash. Same hash → nothing shipped. Different hash → trigger a rebuild, publish the new version. No human in the loop.

The load-bearing step is the canonicalisation. Serialisers reorder keys. Formatters rewrite whitespace. Git lines get shuffled by merge tooling. If you hash the raw bytes, you'll trigger a rebuild every time your linter saves the file — and the team will disable the loop within a week. The correct hash is: keys sorted deterministically, `$ref` chains resolved or stabilised, whitespace stripped, then SHA-256.

Versioning comes from `openapi.info.version`, not from a diff heuristic. Diff-to-semver is wrong often enough that teams who tried it generally turn it off — renames read as "added + removed", optional-becoming-required reads as additive, and deprecations read as no-change. Let the spec author own the version number.

typescriptdrift-loop.ts (sketch)

async function pollForDrift(app: App): Promise<void> {
    const body = await fetch(app.schemaUrl).then(r => r.text());
    const canonical = canonicalise(body);             // sort keys, stabilise $refs
    const hash = sha256(canonical);                   // deterministic fingerprint

    if (hash === app.lastSeenHash) return;            // no drift, stop here

    const version = readInfoVersion(canonical);       // openapi.info.version
    const tarball = await buildSdk(canonical);        // generator runs here
    await publishToRegistry(app.registry, tarball, version);
    await recordDeployment(app.id, { hash, version, tarball });
}

How SDK Factory closes the loop

SDK Factory runs exactly this loop as a hosted service. Each app carries a schema URL and a last-seen hash. A poll cycle fetches, canonicalises, hashes, and compares. On a match, nothing ships. On a mismatch, a build task runs the TypeScript generator, writes the tarball to S3 for audit, and publishes to the configured registry.

The trade we're making versus a hand-rolled version: you don't need to maintain the canonicaliser, the hash storage, the build queue, the publish retry, the registry auth, the audit trail. You do need to make your OpenAPI document reachable from our infrastructure — which is a sentence instead of a quarter of someone's time.

OpenAPI schema drift: what it is, why it happens, and how to stop it

Three kinds of drift, not one

Why disciplined teams still ship drift

Manual fixes that almost work

The automated fix: polling, canonical hashing, auto-rebuild

How SDK Factory closes the loop

FAQ

Related reading

More guides

Ready to stop maintaining the pipeline?