Description: Flexera One Platform – North America – Access Disruption
Timeframe: July 5th, 6:58 AM to July 5th, 8:37 AM PDT
Incident Summary
On July 5, 2025, at 6:58 AM PDT, the Flexera One platform in the North America region experienced an access disruption. During this period, customers may have been unable to load the landing page or access key workflows within the platform.
The disruption was caused by a failure in a core service responsible for routing platform traffic. The affected component became unresponsive, which led to temporary unavailability. A replacement was automatically provisioned by the platform’s orchestration system, and traffic was successfully rerouted without manual intervention.
Platform availability was restored by 8:37 AM PDT. Comprehensive checks were completed to confirm full functionality, and no customer-reported issues were received during the incident.
Root Cause
Primary Root Cause:
A failure occurred in one of the infrastructure components responsible for traffic routing. The node hosting the gateway service became unresponsive, which caused a temporary disruption in platform access. A replacement node was automatically provisioned, restoring routing capabilities without manual intervention.
Contributing Factors:
• Workload Placement: System components responsible for routing traffic were deployed alongside other workloads on shared infrastructure. This configuration may have contributed to increased resource demand ahead of the failure. Measures are being implemented to isolate critical workloads more effectively.
• Alert Routing Gaps: While multiple alerts were triggered during the incident through product-specific monitoring systems, infrastructure-level alerting did not provide sufficient visibility into the platform-level disruption. Enhancements to alert coverage are in progress to support timely detection and awareness of core service impacts.
Remediation Actions
Future Preventative Measures
Following this incident, a detailed root cause analysis and internal retrospective were completed to identify areas for long-term improvement. The following workstreams were initiated under a platform reliability initiative aimed at strengthening infrastructure performance and minimizing the risk of recurrence. While the underlying cause of the infrastructure disruption remains under investigation, the actions already implemented and those underway are expected to significantly improve the platform’s ability to detect, respond to, and recover from similar events in the future.