Apps V1 Outage
Incident Report for Fly.io
Resolved
This incident has been resolved.
Posted Mar 14, 2023 - 08:07 UTC
Update
We currently have deploys enabled while we monitor Nomad's backlog.
Posted Mar 14, 2023 - 05:23 UTC
Update
We've disabled deploys again to allow nomad to catch up on its Job Evalutations.
Posted Mar 14, 2023 - 04:13 UTC
Update
Nomad is working through a backlog of job evaluations, this is slowing deploys to the point where they're failing in many cases.
Posted Mar 14, 2023 - 04:12 UTC
Monitoring
We've restored our Nomad server cluster (now with 10x the RAM) and are re-enabling deploy functionality.
Posted Mar 14, 2023 - 03:31 UTC
Identified
We are working to restore the Nomad server cluster. Our current priority is restoring Nomad without disrupting running applications. There is a chance this is unsuccessful, and applications reboot unexpectedly when we get Nomad back online.
Posted Mar 14, 2023 - 01:56 UTC
Investigating
We have temporarily disabled Apps V1 deploys.

Our Apps V1 platform is currently operating in a severely degraded state. Apps that are already running will stay running, but any changes to VM state will fail, including deploys. This is a result of capacity issues in our underlying nomad cluster. An attempt to grow our cluster has left it in a bad state which we are working to correct.

Our API is also being impacted and will be intermittently available until the Apps V1 platform is restored.
Posted Mar 14, 2023 - 00:44 UTC
This incident affected: Platform and Tools (API, Deployments).