Increased API failures
Incident Report for Fly.io
Resolved
This incident has been resolved.
Posted Oct 23, 2024 - 02:22 UTC
Monitoring
Our internal state is fully re-synchronized, and our metrics are returning to normal. We are continuing to monitor for potential ongoing issues.
Posted Oct 23, 2024 - 01:30 UTC
Update
Restoration of our state propagation system is complete. The system is now processing updates to re-synchronize back to the latest state. Services and APIs should start to recover once this process is completed.
Posted Oct 23, 2024 - 00:07 UTC
Update
Our state propagation system is significantly delayed. To speed up recovery, we will restore the system from the snapshot to clear the backlog. Your machine may be missing from fly m list and some other APIs, but all of your started machines will still be running. The state will re-synchronize back to latest once restoration is completed.
Posted Oct 22, 2024 - 23:08 UTC
Update
We are continuing to work on a fix for this issue.
Posted Oct 22, 2024 - 22:27 UTC
Update
Parts of our APIs should have resumed normal function. We are still applying a fix to the rest of the APIs.
Posted Oct 22, 2024 - 21:15 UTC
Update
We are continuing to apply the fix to all hosts in the fleet. Some hosts continue to see elevated API errors at this time.
Posted Oct 22, 2024 - 20:28 UTC
Update
We are currently in the process of rolling out a fix across our fleet.
Posted Oct 22, 2024 - 19:25 UTC
Update
We are continuing to work on a fix for this issue. Apps with autostart/autostop configured might also see an increased number of request errors.
Posted Oct 22, 2024 - 18:19 UTC
Identified
We have identified the cause of an increase in API errors across the platform and are working on a fix.
Posted Oct 22, 2024 - 18:06 UTC
This incident affected: Customer Applications, Machines API, and Deployments.