Every interface project plans obsessively for go-live: parallel runs, rollback plans, war rooms. Almost none of them plan for month seven — when the war room is disbanded, the project team has moved on, and an upstream vendor ships an update that quietly changes a code set. The message still parses. The interface still runs. The data is now subtly wrong.
Why month seven, not day one
Go-live failures are loud: messages reject, queues stop, someone calls. The dangerous failures are quiet, and they cluster around three events that reliably happen months later:
The upstream change nobody announced. An EHR upgrade, a lab analyzer replacement, a vendor “minor patch” — and a field you mapped now carries different values. Nothing errors; the mapping is simply no longer true.
The queue that backs up on a weekend. Volume grows, a certificate expires, a VPN flaps. If your first indication is a clinician reporting missing results on Monday, the interface has been failing for 60 hours.
The reprocessing decision under pressure. Messages failed; some were resent; some weren’t. Replaying the wrong window creates duplicate results in charts. Without a rehearsed runbook, this decision gets made at 2am by whoever answered the phone.
What operational readiness actually looks like
The fix isn’t better go-live testing — it’s treating the interface as a production service with an operational contract:
- Baseline metrics from day one: normal throughput, latency, and error rates per feed, so “weird” is measurable instead of anecdotal.
- Monitoring that pages on trend, not just failure — queue depth growing, volume dropping to zero during business hours, error rate drifting up.
- Runbooks written before the incident, including the reprocessing decision tree and who is authorized to make it.
- A named owner for every feed — on your team or under a service agreement, but never “the project team,” because the project team is gone by month seven.
This is the entire reason our managed services practice refuses to take a pager before the baseline phase: you can’t detect abnormal until you’ve written down normal. Interfaces are infrastructure. Fund them like it — the alternative is finding out in month seven that nobody did.