Consul cluster outage
Incident Report for Fly.io
Resolved
This incident has been resolved.
Posted Mar 16, 2023 - 03:28 UTC
Monitoring
Our Consul cluster is now stable again. We've re-enabled Nomad deployments and are monitoring for any further issues.
Posted Mar 16, 2023 - 00:58 UTC
Update
The Consul cluster rebuild is nearly completed and we are working to restore service.
Posted Mar 16, 2023 - 00:04 UTC
Update
The Consul cluster rebuild is nearly completed and we are working to restore service.
Posted Mar 15, 2023 - 23:53 UTC
Update
We are working to build a new Consul cluster with 10x the RAM. We aren't yet sure, but believe a routine DNS change might have created a thundering herd problem causing Consul servers to immediately increase RAM usage by 500%. This is not ideal.
Posted Mar 15, 2023 - 22:42 UTC
Identified
We're still working on recovering our Consul cluster.
Posted Mar 15, 2023 - 21:10 UTC
Monitoring
Our Consul cluster is experiencing an outage. This impacts queries to our API, including creating and modifying apps, as well as incoming network requests for recently deployed apps.
Posted Mar 15, 2023 - 19:20 UTC
This incident affected: Platform and Tools (API, Deployments).