Slow API requests

Incident Report for Fly.io

Resolved

This incident has been resolved. All platform and API operations are working normally.

Posted Feb 27, 2026 - 20:21 UTC

Monitoring

API and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery.

Background jobs are almost fully caught up. Users may still see slightly slower requests creating new apps / orgs, but they should complete successfully.

Sprite and MPG cluster creations are processing as normal.

Posted Feb 27, 2026 - 20:05 UTC

Update

A second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to existing apps should also work.

We are working through a backlog of background jobs. New app / organization creations and other other operations that use these will continue to see increased latency or failures while we work thorough these. New MPG cluster and new Sprite creation continues to be impacted.

Posted Feb 27, 2026 - 19:41 UTC

Update

An initial fix has been deployed and we are seeing improvements in load and API performance. Some operations that rely on the Graphql API, such as new app creations and some deployments, will continue to fail at this time.

We are continuing to work on restoring full availability.

Posted Feb 27, 2026 - 19:23 UTC

Update

We are currently seeing full API failures for requests to our Graphql API and elevated failures for the machines API. Direct calls to these apis may fail, along with many flyctl commands.

We have identified the cause of the issue and are continuing to work on a fix.

Existing running machines and apps should continue to be reachable, but creates, deploys, or other features relying on platform API calls will fail at this time.

Posted Feb 27, 2026 - 19:05 UTC

Update

New Sprite creations are also timing out or failing at this time. We are continuing to work on a fix for this issue.

Posted Feb 27, 2026 - 18:59 UTC

Update

We are continuing to work on a fix for this issue.

Posted Feb 27, 2026 - 18:53 UTC

Identified

We have identified the cause of the increased latency and are working on a fix.

The most common errors we are seeing is timeouts when users attempt to perform an action against a newly created app / machine resource. Those may timeout or fail with an `app|machine not found` error

Posted Feb 27, 2026 - 18:52 UTC

Investigating

We are investigating increased in API request latency and timeouts with the main platform API.
This is impacting multiple operations, including creating, querying or performing actions against machines, as well as platform level operations like adding payment methods.

Posted Feb 27, 2026 - 18:50 UTC

This incident affected: Dashboard, Machines API, Deployments, Remote Builds, and Sprites.