Flexera- Spot- US east- 1- Service Degradation (Instance Launch Failures)

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Flexera- Spot- US east- 1a- Database connection and low request issues

Timeframe: May 7, 2026, 4:25 PM PST to May 8, 2026, 7:04 PM PST

Incident Summary

On May 7, 2026, Flexera identified a service degradation impacting Spot services in the US East 1a region. During the impact duration, customers experienced difficulties launching instances. In addition, some backend services experienced reduced request handling capacity and intermittent database connectivity issues, resulting in degraded provisioning behavior and delayed scaling operations.

Initial investigation determined that the issue coincided with a broader disruption affecting an external cloud infrastructure provider. The external disruption introduced elevated latency and intermittent service instability across dependent infrastructure components, which contributed to degraded provisioning and backend service performance within the Spot platform.

During the investigation, engineering teams also identified a database issue caused by the service provider outage that posed a potential performance risk under degraded infrastructure conditions. The issue was mitigated early in the response process, and system stability improved following remediation actions.

Throughout the incident, technical teams continuously monitored service health, validated provisioning behavior, and assessed mitigation options to minimize customer impact while the external provider disruption remained active. Over several hours following mitigation, Spot services remained stable with no recurrence of customer-facing issues observed.

After an extended monitoring period confirmed continued stability, the incident was considered resolved following confirmation that the underlying external provider disruption had been fully remediated on May 8, 2026, at 7:04 PM PST.

Root Cause

The issue was primarily caused by a disruption affecting an external cloud infrastructure provider supporting services in the US East 1a region. The disruption resulted in increased latency, intermittent connectivity issues, and degraded infrastructure performance, which impacted Spot instance provisioning operations and related backend services.

Contributing Factors:

  • Elevated latency and intermittent failures across dependent infrastructure services increased provisioning delays and request instability.
  • A database performance issue, triggered by the underlying service provider disruption, introduced additional load under degraded infrastructure conditions and increased the risk of intermittent service instability.
  • The prolonged nature of the external provider disruption extended the duration of customer impact and required continuous monitoring and mitigation activities.

Remediation Actions

The following remediation steps were implemented to restore service functionality:

  • Investigated and monitored infrastructure and provisioning service behavior across impacted Spot components.
  • Identified and mitigated a database performance degradation contributing to elevated performance risk.
  • Validated provisioning stability and backend service recovery following mitigation efforts.
  • Performed extended monitoring to ensure sustained service stability and confirm the absence of recurring customer impact.

Future Preventative Measures

  • Alerting Reliability Improvements: Improve alert delivery validation and monitoring to ensure critical database alerts are consistently propagated across all operational notification channels, including collaboration and incident management platforms.
  • Regional Resilience Evaluation: Evaluate additional regional resilience and failover capabilities to reduce the impact scope of regional infrastructure disruptions. Any future implementation will be subject to cost-impact analysis and internal approval processes.
  • Third-Party Provider Escalation Process: Enhance operational procedures for third-party infrastructure incidents by establishing earlier escalation and engagement processes with external cloud service providers during active service disruptions.
Posted May 13, 2026 - 03:23 PDT

Resolved

Following an extended period of monitoring, our teams have confirmed that services have remained stable. The external service provider has also confirmed full resolution of their outage. This incident is now resolved, and a detailed post-mortem report will be shared with additional insights.
Posted May 08, 2026 - 22:23 PDT

Update

Our technical teams have completed internal stabilization actions to address the earlier service impact, and Spot services have remained stable for the last several hours. We have not observed any recurrence of customer-facing impact, and instance provisioning behavior continues to operate as expected.

We are continuing to monitor the environment and our service provider’s updates closely. The incident will remain in monitoring for the next few hours, with closure to follow once this additional monitoring period is complete.
Posted May 08, 2026 - 12:39 PDT

Update

Some customers may experience residual issues following the recent fix. Our technical teams are actively working to address these and ensure full service stability. We will provide further updates as more information becomes available.
Posted May 08, 2026 - 01:25 PDT

Monitoring

Our teams have implemented mitigation measures to address the impact from the external service disruption, and services have now been restored. We are actively monitoring the platform to ensure continued stability and will provide further updates as more information becomes available.
Posted May 07, 2026 - 23:11 PDT

Identified

Incident Description: We are currently investigating multiple issues impacting SPOT services.Customers may experience difficulties launching instances in the affected region, resulting in degraded provisioning capabilities. Initial findings indicate that the issue is related to an ongoing disruption within our cloud service provider, specifically in the us-east-1 region. As a result, customers may face difficulties launching instances, resulting in degraded service performance.

Priority: P2

Restoration Activity: Our technical teams are actively engaged and are working with the service provider to monitor the ongoing regional issue. We are assessing the impact on dependent services, including database connectivity and request processing, while identifying potential mitigation options. Further updates will be provided as more information becomes available and as service stability improves.
Posted May 07, 2026 - 20:00 PDT
This incident affected: Spot (Spot UI, Spot API, Spot Website).