Flexera One Platform - NA - Service Degradation

Incident Report for Flexera System Status Dashboard

Postmortem

Description: Flexera One Platform – North America – Access Disruption

Timeframe: July 5th, 6:58 AM to July 5th, 8:37 AM PDT

Incident Summary

On July 5, 2025, at 6:58 AM PDT, the Flexera One platform in the North America region experienced an access disruption. During this period, customers may have been unable to load the landing page or access key workflows within the platform.

The disruption was caused by a failure in a core service responsible for routing platform traffic. The affected component became unresponsive, which led to temporary unavailability. A replacement was automatically provisioned by the platform’s orchestration system, and traffic was successfully rerouted without manual intervention.

Platform availability was restored by 8:37 AM PDT. Comprehensive checks were completed to confirm full functionality, and no customer-reported issues were received during the incident.

Root Cause

Primary Root Cause:

A failure occurred in one of the infrastructure components responsible for traffic routing. The node hosting the gateway service became unresponsive, which caused a temporary disruption in platform access. A replacement node was automatically provisioned, restoring routing capabilities without manual intervention.

Contributing Factors:

• Workload Placement: System components responsible for routing traffic were deployed alongside other workloads on shared infrastructure. This configuration may have contributed to increased resource demand ahead of the failure. Measures are being implemented to isolate critical workloads more effectively.
• Alert Routing Gaps: While multiple alerts were triggered during the incident through product-specific monitoring systems, infrastructure-level alerting did not provide sufficient visibility into the platform-level disruption. Enhancements to alert coverage are in progress to support timely detection and awareness of core service impacts.

Remediation Actions

  1. Automated Traffic Recovery: The platform’s orchestration system automatically provisioned a healthy replacement for the failed component, restoring traffic routing without requiring manual intervention.
  2. Post-Recovery Readiness Checks: After restoration, teams performed comprehensive validations across key workflows to confirm full accessibility and identify any residual impact, supporting overall readiness and continuous monitoring efforts.
  3. Targeted Infrastructure Enhancements: Following the incident, our technical teams conducted a detailed analysis and implemented targeted infrastructure adjustments to reduce the risk of recurrence. As part of this effort, throughput capacity for critical services was increased to better accommodate variable usage patterns and more consistent traffic handling. This adjustment is part of a broader initiative to strengthen performance and minimize the risk of future resource-related disruptions.

Future Preventative Measures

Following this incident, a detailed root cause analysis and internal retrospective were completed to identify areas for long-term improvement. The following workstreams were initiated under a platform reliability initiative aimed at strengthening infrastructure performance and minimizing the risk of recurrence. While the underlying cause of the infrastructure disruption remains under investigation, the actions already implemented and those underway are expected to significantly improve the platform’s ability to detect, respond to, and recover from similar events in the future.

  1. Increased Throughput Capacity: Throughput limits for key platform services were elevated to accommodate varying levels of system usage. This change is expected to reduce the likelihood of resource-related disruptions under peak or unexpected load conditions. Ongoing analysis is in place to further investigate the factors contributing to usage spikes.
  2. Infrastructure Resource Optimization: Work is underway to isolate supporting system processes from core routing services. This adjustment aims to reduce resource contention on shared infrastructure and improve the efficiency of traffic handling during variable demand.
  3. Planned Upgrade of Gateway Codebase: An upgrade to the gateway component is currently being planned to incorporate the latest stability improvements. This initiative is intended to improve resilience, observability coverage, and the ability to diagnose abnormal behavior in future incidents.
Posted Jul 15, 2025 - 20:13 PDT

Resolved

Services have been fully restored, and the platform is operating normally. Our teams are reviewing the incident to identify opportunities for improvement and prevent similar occurrences in the future. We will conduct a full retrospective in the coming days as part of our commitment to continuous improvement.
Posted Jul 05, 2025 - 09:29 PDT

Monitoring

Services are now back online, and customers should be able to access the platform as expected. Our teams are continuing to investigate the incident to confirm stability and identify the root cause. Further
Posted Jul 05, 2025 - 09:06 PDT

Update

Our teams are investigating the cause of the service degradation and are working on rerouting traffic as part of our efforts to restore functionality. We are taking steps to minimize customer impact while continuing to pursue a complete resolution. Further updates will be shared as progress continues.
Posted Jul 05, 2025 - 08:56 PDT

Investigating

Issue Description: We are currently investigating a service degradation affecting the Flexera One platform in the North America region. Customers may experience issues accessing services at this time.

Priority: P1

Restoration Activity: Our technical teams have been engaged and are working to identify the affected service and determine the scope of impact. Further updates will be shared as the investigation progresses.
Posted Jul 05, 2025 - 08:11 PDT
This incident affected: Flexera One - IT Asset Management - North America (IT Asset Management - US Beacon Communication, IT Asset Management - US Inventory Upload, IT Asset Management - US Login Page, IT Asset Management - US Batch Processing System, IT Asset Management - US Business Reporting, IT Asset Management - US SaaS Manager), Flexera One - IT Visibility - North America (IT Visibility US), Flexera One - Cloud License Management - North America (Cloud License Management - US), and Flexera One - Cloud Management - North America (Cloud Cost Optimization - US, Cloudscape).