The End of Manual Certificate Rotation
Certificate expiration outages are almost always preventable. The challenge is treating certificate lifecycle management as a repeatable system instead of a collection of manual tasks.
Everyone has a certificate expiration story.
Mine usually involve TLS errors, confused service owners, old runbooks, and somebody asking, “Wait, who owns this cert?”
The thing I’ve learned is that expired certificates are rarely the real problem. The real problem is that most organizations don’t have a complete picture of their certificate lifecycle.
Ask a few simple questions:
- Which certificates expire in the next 90 days?
- Who owns them?
- What services use them?
- How are they rotated?
- How do you know the new certificate is actually being served?
A surprising number of teams can’t answer all of those consistently.
I’ve seen certificates with no owner. Certificates attached to services nobody remembered existed. Certificates everyone assumed were being monitored but weren’t.
That’s usually where the trouble starts.
The first step isn’t automation. It’s inventory.
A few years ago, I was helping clean up a certificate management process that had grown organically over time. The inventory exercise uncovered exactly what you’d expect: certificates with unclear ownership, services depending on certificates nobody had looked at in years, and monitoring that checked the secret store but never verified what was being served.
We also found certificates that had been expired for multiple years.
Not days. Years.
Nothing had broken.
That should have been reassuring. It wasn’t.
None of those problems were particularly complicated. They were just invisible until someone went looking for them.
You need to know what exists before you can automate it.
What a healthy lifecycle looks like
Once you have that inventory, automation becomes much easier. A healthy certificate lifecycle should be able to:
- Generate key material
- Request or issue certificates
- Update secrets
- Reload services
- Verify the new certificate is actually being presented
That last step matters more than people think.
One mistake I see fairly often is monitoring what’s stored in a secret manager instead of what’s actually being served. A certificate can be sitting in Vault, AWS Secrets Manager, Azure Key Vault, or a Kubernetes Secret with the correct expiration date while the application is still presenting the old one.
The monitoring says everything is fine. The TLS handshake says otherwise.
That’s why validation needs to happen against the endpoint itself.
On building your own CA
One more thing worth saying, even if it’s not popular: unless you have a clear understanding of what running an internal CA actually involves — the tooling, operational overhead, revocation, and recovery — it may not be worth building one.
Trusted third-party certificate authorities exist for a reason.
A lot of teams reach for an internal CA because it feels like the “real” solution, then spend the next two years maintaining infrastructure that was never the actual problem.
Certificate rotation should be boring
At the end of the day, certificate management shouldn’t be a recurring project. It should be a boring system that quietly does its job.
If certificate rotation still involves a spreadsheet, a calendar reminder, a runbook, and a maintenance window, there’s probably an opportunity to automate more of it.
Expired certificates should be boring.
The multi-year-expired cert that breaks nothing is somehow worse — because it means your whole mental model of what’s load-bearing might be wrong.