All Systems Operational

About This Site

This page is for updates about global incidents. It does not include updates about routine hardware failures or isolated infrastructure events that have limited impact. For a personalized view of all events that might affect your apps, please check the personalized status page in your Fly Organization's dashboard. For all internal incidents and other activities, please check Infra Log.

Customer Applications Operational
Dashboard Operational
Machines API Operational
Regional Availability Operational
AMS - Amsterdam, Netherlands Operational
ARN - Stockholm, Sweden Operational
BOM - Mumbai, India Operational
CDG - Paris, France Operational
DFW - Dallas, Texas (US) Operational
EWR - Secaucus, NJ (US) Operational
FRA - Frankfurt, Germany Operational
GRU - Sao Paulo, Brazil Operational
IAD - Ashburn, Virginia (US) Operational
JNB - Johannesburg, South Africa Operational
LAX - Los Angeles, California (US) Operational
LHR - London, United Kingdom Operational
NRT - Tokyo, Japan Operational
ORD - Chicago, Illinois (US) Operational
SIN - Singapore Operational
SJC - San Jose, California (US) Operational
SYD - Sydney, Australia Operational
YYZ - Toronto, Canada Operational
Persistent Storage (Volumes) Operational
Deployments Operational
Remote Builds Operational
Logs Operational
Metrics Operational
SSL/TLS Certificate Provisioning Operational
UDP Anycast Operational
Fly Machine Image Registry 1 Operational
Fly Machine Image Registry 2 Operational
Extensions Operational
Upstash for Redis Operational
DNS Operational
Fly Machine .internal DNS Operational
Fly Machine External DNS Operational
*.flyio.net Nameservers Operational
flydns.net Operational
Billing Operational
Usage Metrics API Operational
Stripe API Connection Operational
Corrosion Operational
Managed Postgres Operational
90 days ago
99.94 % uptime
Today
Management Plane - ORD Operational
90 days ago
99.96 % uptime
Today
Management Plane - IAD Operational
90 days ago
99.76 % uptime
Today
Management Plane - FRA Operational
90 days ago
100.0 % uptime
Today
Management Plane - GRU Operational
90 days ago
100.0 % uptime
Today
Management Plane - LAX Operational
90 days ago
100.0 % uptime
Today
Management Plane - SYD Operational
90 days ago
99.98 % uptime
Today
Management Plane - AMS Operational
90 days ago
99.73 % uptime
Today
Management Plane - LHR Operational
90 days ago
100.0 % uptime
Today
Management Plane - NRT Operational
90 days ago
100.0 % uptime
Today
Management Plane - SIN Operational
90 days ago
99.84 % uptime
Today
Management Plane - SJC Operational
90 days ago
100.0 % uptime
Today
Management Plane - YYZ Operational
90 days ago
100.0 % uptime
Today
Phoenix.new Operational
Support Portal Operational
Sprites Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

Network Maintenance in GRU Mar 4, 2026 03:00-09:00 UTC

An upstream provider is performing network maintenance in GRU on 2026-03-04, from 03:00 UTC (00:00am local time) to 09:00 UTC (6:00am local time). A loss of connectivity for up to 30 minutes is expected within the scheduled maintenance window.
Posted on Feb 20, 2026 - 19:32 UTC
Feb 27, 2026
Resolved - This incident has been resolved. All platform and API operations are working normally.
Feb 27, 20:21 UTC
Monitoring - API and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery.

Background jobs are almost fully caught up. Users may still see slightly slower requests creating new apps / orgs, but they should complete successfully.

Sprite and MPG cluster creations are processing as normal.

Feb 27, 20:05 UTC
Update - A second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to existing apps should also work.

We are working through a backlog of background jobs. New app / organization creations and other other operations that use these will continue to see increased latency or failures while we work thorough these. New MPG cluster and new Sprite creation continues to be impacted.

Feb 27, 19:41 UTC
Update - An initial fix has been deployed and we are seeing improvements in load and API performance. Some operations that rely on the Graphql API, such as new app creations and some deployments, will continue to fail at this time.

We are continuing to work on restoring full availability.

Feb 27, 19:23 UTC
Update - We are currently seeing full API failures for requests to our Graphql API and elevated failures for the machines API. Direct calls to these apis may fail, along with many flyctl commands.

We have identified the cause of the issue and are continuing to work on a fix.

Existing running machines and apps should continue to be reachable, but creates, deploys, or other features relying on platform API calls will fail at this time.

Feb 27, 19:05 UTC
Update - New Sprite creations are also timing out or failing at this time. We are continuing to work on a fix for this issue.
Feb 27, 18:59 UTC
Update - We are continuing to work on a fix for this issue.
Feb 27, 18:53 UTC
Identified - We have identified the cause of the increased latency and are working on a fix.

The most common errors we are seeing is timeouts when users attempt to perform an action against a newly created app / machine resource. Those may timeout or fail with an `app|machine not found` error

Feb 27, 18:52 UTC
Investigating - We are investigating increased in API request latency and timeouts with the main platform API.
This is impacting multiple operations, including creating, querying or performing actions against machines, as well as platform level operations like adding payment methods.

Feb 27, 18:50 UTC
Resolved - This incident has been resolved.
Feb 27, 17:54 UTC
Monitoring - We have provisioned additional capacity in dfw and iad and are monitoring to ensure machine and builder starts are succeeding consistently.
Feb 27, 17:31 UTC
Identified - These regions (Dallas, TX dfw and Ashburn, VA iad) are currently low on capacity. New machine creates in these regions might fail temporarily, and Depot builders may be unavailable, causing deploys to hang in "Waiting for Depot builder".
If you are having issues with Depot builders, consider moving them to a different non-iad, non-dfw region in your fly.io dashboard's "Settings" page under "App builders", or try `--depot=false`.

Feb 27, 15:34 UTC
Feb 26, 2026
Resolved - This incident has been resolved.
Feb 26, 22:28 UTC
Monitoring - We're continuing to monitor after having added more capacity to our DFW and IAD regions.

Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.

Feb 26, 20:19 UTC
Update - We have added additional capacity in DFW and IAD regions and are monitoring the impact.

New machine creates and deploys without volumes are seeing improved success rates. Deploys using depot builders in those regions are also improving, with much quicker builder start times.

Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users should use `fly volume fork --vm-memory ` to fork the volume to a host with more capacity, then retry the deploy or start command using the new volume.

Feb 26, 18:57 UTC
Update - We've identified some newly created Managed Postgres clusters are failing to come up healthy in these regions.
Feb 26, 17:18 UTC
Update - New machine creates in these regions might fail temporarily, and Depot builders may be unavailable. If you are having issues with Depot builders, consider moving them to a different region, or try `--depot=false`.
Feb 26, 17:05 UTC
Identified - We have identified the problem and are working on a fix.
Feb 26, 17:00 UTC
Feb 25, 2026

No incidents reported.

Feb 24, 2026
Resolved - This incident has been resolved.
Feb 24, 17:51 UTC
Update - A slow deploy is causing Sprites API degradation. We are implementing a fix.
Feb 24, 17:24 UTC
Identified - A slow deploy is causing Sprites API degradation. We are implementing a fix.
Feb 24, 17:23 UTC
Resolved - Metrics processing has caught up, and we don't see any data loss.
Feb 24, 11:06 UTC
Update - Delayed metrics are still being processed.
Feb 24, 09:35 UTC
Monitoring - Metrics are coming back online, but it will take a little time to process what's backed up in the queues.
Feb 24, 06:46 UTC
Update - We're continuing to work with VictoriaMetrics support on a fix for this issue.
Feb 24, 05:49 UTC
Identified - In some cases data is missing or lagging. We've identified the problem and are working on a fix.
Feb 24, 04:33 UTC
Resolved - This incident has been resolved.
Feb 24, 10:44 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 24, 10:25 UTC
Investigating - We are currently investigating issues creating new Sprites.
Feb 24, 09:39 UTC
Feb 23, 2026
Resolved - This incident has been resolved as of 20:30 UTC.
Feb 23, 20:30 UTC
Investigating - We are currently investigating issues with the MPG control plane. Users may experience delays or hanging when creating or deleting databases via the dashboard or CLI.
Feb 23, 15:00 UTC
Feb 22, 2026

No incidents reported.

Feb 21, 2026

No incidents reported.

Feb 20, 2026
Resolved - This incident has been resolved.
Feb 20, 20:49 UTC
Monitoring - The fix has been rolled out and we are seeing deploys using depot builder succeeding normally. We continue to monitor to ensure full recovery.

Depot builders have been reenabled as the default option for new deploys

Feb 20, 19:38 UTC
Update - A fix is being rolled out. Fly builders continue to be the default while this is deployed
Feb 20, 17:59 UTC
Identified - We are again seeing elevated latency provisioning depot builders on new deploys. Users may see deploys using Depot builders hang or timeout at the "Waiting for Depot Builder" step. We are working on a fix.

We are switching all deploys to use the default Fly builders in the meantime.

If desired users can manually switch back to depot builders using `fly deploy --depot=true` but may continue to see latency issues at this time.

Feb 20, 16:39 UTC
Monitoring - We have seen elevated latency provisioning Depot builders during deployments over the past hour. This caused some deploys to hang or timeout at the "Waiting for Depot Builder" step in this period.

Latency has improved and builder provision times are back to normal. We're continuing to monitor to ensure latency remains normal.

Feb 20, 16:14 UTC
Resolved - Network traffic in LHR has been stable for some time now, we are not seeing any further issues.
Feb 20, 11:57 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 20, 11:21 UTC
Investigating - We’re currently investigating this issue.
Feb 20, 10:52 UTC
Resolved - This incident has been resolved.
Feb 20, 00:05 UTC
Identified - While we have seen some improvement from the previous fix, we are still seeing elevated rates of Registry connection issues. Users may continue to see slower machine creates and deploys due to slow image pulls. Deploys may succeed on a retry.

We are continuing to work on restoring normal registry performance

Feb 19, 22:24 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 19, 21:49 UTC
Identified - The issue has been identified and a fix is being implemented.
Feb 19, 21:43 UTC
Investigating - We are currently investigating this issue.
Feb 19, 21:14 UTC
Feb 19, 2026
Feb 18, 2026
Resolved - This incident has been resolved.
Feb 18, 16:44 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 18, 16:28 UTC
Update - We are continuing to work on a fix for this issue.
Feb 18, 16:23 UTC
Identified - The issue has been identified and a fix is being implemented.
Feb 18, 16:22 UTC
Feb 17, 2026
Resolved - Earlier today, an issue caused elevated rate limiting and some deployment timeouts. A fix is in place and deployments are back to normal.
Feb 17, 14:24 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 17, 13:42 UTC
Identified - We’re investigating elevated 429 errors from flaps causing deployment timeouts. Affected deploys are failing with:
✖ Failed: error waiting for release_command machine XX to finish running: timeout reached waiting for machine's state to change
Your machine never reached the state "destroyed".

Feb 17, 13:06 UTC
Feb 16, 2026

No incidents reported.

Feb 15, 2026

No incidents reported.

Feb 14, 2026
Resolved - This incident has been resolved.
Feb 14, 14:27 UTC
Monitoring - A fix has been implemented and we are seeing full recovery of the control plane in ORD. With that recovery we are seeing impacted replicas catching up and clusters returning to normal health. We're continuing to monitor for full recovery.
Feb 14, 14:07 UTC
Update - We are continuing to work on a fix for this issue.
Feb 14, 13:47 UTC
Identified - The issue has been identified and we are working on a fix. The majority of MPG clusters in ORD continue to run normally, though some users may still see degraded replicas at this time. Some clusters in the region will have experienced a primary -> replica failover.
Feb 14, 11:47 UTC
Investigating - We are currently investigating issues with the MPG control plane in ORD. A small number of clusters in the region may be seeing replication lag or PGBouncers connectivity issues at this time.
Feb 14, 11:33 UTC
Feb 13, 2026

No incidents reported.