Critical host failure in FRA

Resolved

Partial outage

Started almost 3 years agoLasted about 5 hours

Affected

Europe

Updates

Resolved
September 05, 2023 at 9:30 PM
Resolved
September 05, 2023 at 9:30 PM
This incident has been resolved.
Monitoring
September 05, 2023 at 6:40 PM
Monitoring
September 05, 2023 at 6:40 PM
We successfully recovered the impacted servers.

Some workload are still being recovered automatically.
Identified
September 05, 2023 at 5:05 PM
Identified
September 05, 2023 at 5:05 PM
We identified the root cause of the issue, some of the physical servers are lost because of high load, it seems that some reschedule of the impacted workload triggered a cascaded failure on other nodes.

We are actively trying to recover access to the impacted servers.

Some workload are still impacted and will soon recover.
New build and container start are also impacted.
Investigating
September 05, 2023 at 4:33 PM
Investigating
September 05, 2023 at 4:33 PM
We are seing a high increase of container restart and container boot failure in FRA.

We are currently investigating this incident.

Koyeb - Critical host failure in FRA – Incident details