Increased API Errors

Incident Report for Fly.io

Resolved

This incident has been resolved.
Posted Nov 29, 2025 - 07:05 UTC

Update

Our metrics show that machines API has mostly recovered. We're still working through a small number of degraded Managed Postgres clusters.
Posted Nov 29, 2025 - 03:41 UTC

Monitoring

We have identified and implemented a fix for the root cause of this incident and are monitoring the results. Machines API should be seeing recovery.

On the Managed Postgres side, New MPG creates should be working. A small number of clusters remain degraded, or are experiencing connectivity issues. We are continuing to restore these as quickly as possible.
Posted Nov 29, 2025 - 02:58 UTC

Update

We are continuing to work on a fix for this issue. Users will still see elevated error rates creating new apps, machines, and MPG clusters, as well as other API operations. Users may also see newly created machines remaining in a `created` state for an extended period before starting.
Posted Nov 29, 2025 - 02:30 UTC

Update

We are continuing to work on a fix for this issue.
Posted Nov 29, 2025 - 01:17 UTC

Update

We are continuing to see elevated errors with creating new apps, new MPG clusters, and setting secrets. A small number of MPG clusters continue to see connectivity errors. We are continuing to work on these issues.
Posted Nov 29, 2025 - 00:33 UTC

Update

We continue to see increased error rates creating new apps and MPG clusters, as well as with setting secrets. We are continuing to work to resolve these issues.

A small number of MPG clusters are still experiencing connectivity issues. We are working to restore these to a healthy state as quickly as possible.
Posted Nov 28, 2025 - 23:41 UTC

Update

We have identified some continuing issues with creating new apps and secrets and are working to resolve them.
Posted Nov 28, 2025 - 22:17 UTC

Update

Our monitoring indicates that machines API, flyctl and dashboard access should have recovered. Some Managed Postgres clusters may still be affected -- Rest assured that your cluster's data is intact while we are working to recover them to a working state.
Posted Nov 28, 2025 - 21:54 UTC

Identified

We have identified and implemented a fix for the API outage. Some Managed Postgres clusters may be affected as a secondary effect and we are currently working to restore them.
Posted Nov 28, 2025 - 21:36 UTC

Investigating

We are investigating an increase in errors that affects numerous API endpoints including the Machines API, flyctl and dashboard.
Posted Nov 28, 2025 - 21:16 UTC
This incident affected: Managed Postgres (Management Plane - ORD, Management Plane - IAD, Management Plane - FRA, Management Plane - GRU, Management Plane - LAX, Management Plane - SYD) and Dashboard, Machines API, Deployments.