All Systems Operational

About This Site

This page is for updates about global incidents. It does not include updates about routine hardware failures or isolated infrastructure events that have limited impact. For a personalized view of all events that might affect your apps, please check the personalized status page in your Fly Organization's dashboard. For all internal incidents and other activities, please check Infra Log.

Customer Applications Operational
Dashboard Operational
Machines API Operational
Regional Availability Operational
AMS - Amsterdam, Netherlands Operational
ARN - Stockholm, Sweden Operational
ATL - Atlanta, Georgia (US) Operational
BOG - Bogotá, Colombia Operational
BOM - Mumbai, India Operational
CDG - Paris, France Operational
DEN - Denver, Colorado (US) Operational
DFW - Dallas, Texas (US) Operational
EWR - Secaucus, NJ (US) Operational
EZE - Ezeiza, Argentina Operational
FRA - Frankfurt, Germany Operational
GDL - Guadalajara, Mexico Operational
GIG - Rio de Janeiro, Brazil Operational
GRU - Sao Paulo, Brazil Operational
HKG - Hong Kong Operational
IAD - Ashburn, Virginia (US) Operational
JNB - Johannesburg, South Africa Operational
LAX - Los Angeles, California (US) Operational
LHR - London, United Kingdom Operational
MAD - Madrid, Spain Operational
MEL - Melbourne, Australia Operational
MIA - Miami, Florida (US) Operational
NRT - Tokyo, Japan Operational
ORD - Chicago, Illinois (US) Operational
OTP - Bucharest, Romania Operational
PHX - Phoenix, Arizona (US) Operational
QRO - Querétaro, Mexico Operational
SCL - Santiago, Chile Operational
SEA - Seattle, Washington (US) Operational
SIN - Singapore Operational
SJC - San Jose, California (US) Operational
SYD - Sydney, Australia Operational
WAW - Warsaw, Poland Operational
YUL - Montréal, Canada Operational
YYZ - Toronto, Canada Operational
Persistent Storage (Volumes) ? Operational
Deployments ? Operational
Remote Builds Operational
Logs Operational
Metrics ? Operational
SSL/TLS Certificate Provisioning Operational
UDP Anycast ? Operational
Fly Machine Image Registry 1 Operational
Fly Machine Image Registry 2 Operational
Extensions Operational
Upstash for Redis Operational
DNS Operational
Fly Machine .internal DNS ? Operational
Fly Machine External DNS Operational
*.fly.dev Nameservers Operational
*.flyio.net Nameservers Operational
Billing Operational
Usage Metrics API Operational
Stripe API Connection Operational
Corrosion ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Dec 2, 2024

No incidents reported today.

Dec 1, 2024

No incidents reported.

Nov 30, 2024

No incidents reported.

Nov 29, 2024

No incidents reported.

Nov 28, 2024

No incidents reported.

Nov 27, 2024

No incidents reported.

Nov 26, 2024
Resolved - This incident has been resolved.
Nov 26, 23:25 UTC
Monitoring - A fix has been implemented and both Machines API and GraphQL API performance have returned to normal.
Nov 26, 21:13 UTC
Identified - We have identified the cause of the API latency increase and are working to mitigate
Nov 26, 20:28 UTC
Investigating - We are currently investigating elevated error rates with our Machines and Graphql APIs.

Users may experience slower responses or timeouts using the Machines API and flyctl commands

Nov 26, 20:23 UTC
Resolved - We have determined that some customers' machines are being throttled due to our full rollout of CPU quotas, separate from the incident yesterday. This in turn caused apparent networking issues. We have now temporarily rolled back these changes while we work with customers to better adapt to CPU quotas.
Nov 26, 16:11 UTC
Investigating - We are aware of customer-reported issues with internal networking and are investigating.
Nov 26, 14:30 UTC
Resolved - This incident has been resolved.
Nov 26, 08:15 UTC
Update - We've scaled up our systems and applied fixes to our API. Everything should be operational now.
Nov 26, 07:52 UTC
Update - We are scaling up our systems to handle the increased traffic
Nov 26, 05:43 UTC
Update - All hosts have completed the restoration process and we are seeing our overall Corrosion cluster health and performance return to normal.

Machine API and GraphQL API error rates are improving, but some users may still see elevated rates of request timeouts and/or 504 errors when using the Machines API or Flyctl commands. We are continuing to monitor these services as they recover.

Nov 26, 03:42 UTC
Monitoring - The restore process has completed on the majority of hosts in our fleet and we are seeing overall Corrosion cluster health and performance return to normal.

There are a small number of hosts that are still being worked on, we aim to have them restored shortly.

Nov 26, 02:31 UTC
Update - We are running a restoration and reseed process to bring the Corrosion cluster back to a healthy, current state.
During this restoration process, you may see elevated error rates on machines or apps that have been recently updated.

Nov 26, 02:06 UTC
Update - The updates have been applied, however we are still not seeing recovery on all Corrosion nodes. We are continuing to work on a fix.

The machines API and proxy performance remains in a degraded state, especially with newly created and updated machines.

Nov 25, 23:58 UTC
Update - The Machines API issues stem from a propagation delay in our global state store, Corrosion.

We have completed deploying a configuration change to our Corrosion cluster and will be applying these changes to each node shortly. We expect improvement once the changes are applied.

In the meantime users may still see degraded machines API and proxy performance, especially with newly created machines

Nov 25, 22:15 UTC
Identified - The issue has been identified and a fix is being implemented.
Nov 25, 20:20 UTC
Investigating - We are investigating degraded API performance
Nov 25, 20:10 UTC
Nov 25, 2024
Nov 24, 2024

No incidents reported.

Nov 23, 2024

No incidents reported.

Nov 22, 2024
Resolved - This incident has been resolved.
Nov 22, 04:29 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Nov 21, 21:41 UTC
Investigating - We are investigating an issue with application log search. This impacts Fly Metrics log search panels, and historical app logs.

Streaming logs using `fly logs`, the Live Logs page in the dashboard, and Fly Log Shipper services continue to work as expected.

Nov 21, 15:56 UTC
Nov 21, 2024
Nov 20, 2024
Completed - The scheduled maintenance has been completed.
Nov 20, 05:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Nov 20, 02:00 UTC
Scheduled - Our network provider is performing an emergency switch replacement during this window. An up to one hour network outage is expected during this maintenance window. Please verify your fly apps are deployed to more than one region to avoid impact.
Nov 20, 00:30 UTC
Nov 19, 2024
Resolved - This incident has been resolved.
Nov 19, 19:48 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Nov 19, 19:04 UTC
Investigating - We are investigating an issue causing application log search to be unavailable. This is affecting the Fly Metrics log search panels, and historical application logs initially returned from the `fly logs` command.

Streaming logs using `fly logs`, the Live Logs page in the dashboard, and Fly Log Shipper services continue to work as expected.

Nov 19, 18:46 UTC
Nov 18, 2024

No incidents reported.