Monitoring: Elevated Packet Loss
Resolved
Feb 26 at 11:00am GMT
Post-Incident Review
1. Incident Summary
- Incident ID: 518417
- Incident Date & Time: Feb 24, 11:14 PM / Feb 25, 12:45 AM / Feb 25, 11:15 PM / Feb 26, 12:30 AM
Affected Systems/Services: UK-A Routing Upstream
Severity Level: Low to Medium
2. Problem
A brief network disruption occurred at UK-A resulting in four outages of approximately 40 seconds each.
3. Timeline
Date/Time | Event Description |
---|---|
Feb 24, 11:14 PM GMT | No packets returned. |
Feb 25, 12:45 AM GMT | No packets returned |
Feb 25, 12:50 AM GMT | Support case opened with upstream provider. |
Feb 25, 11:15 PM GMT | No packets returned |
Feb 26, 12:30 AM GMT | No packets returned |
Feb 26, 10:00 AM GMT | RCA identified |
4. Root Cause Analysis
- Root Cause: The facility provider confirmed that an upstream provider was performing maintenance and failed to cleanly shut down the route, resulting in BGP route convergence.
5. Impact Assessment
- Minimal, largest outage lasted ~40 seconds with users routed across the failover links.
6. Corrective Actions
- Our facility's provider will raise this with their upstream provider. - ---
Last updated: February 26, 2025
Affected services
UK-A Core Stack
Updated
Feb 26 at 12:42am GMT
We have observed no additional packet loss. Our upstream provider is continuing to investigate.
Affected services
UK-A Core Stack
Updated
Feb 25 at 11:45pm GMT
We are investing elevated levels of packet loss impacting UK-A's upstream routes. Two 30 second outages have been observed. We have escalated yesterday's support case with our facility's provider.
Affected services
UK-A Core Stack
Updated
Feb 25 at 01:11am GMT
No additional alerting has been received.
Started at: 25 Feb 2025 at 12:45am GMT
Resolved at: 25 Feb 2025 at 12:49am GMT (automatically)
Length: 1 minute
Started at: 24 Feb 2025 at 11:15pm GMT
Resolved at: 24 Feb 2025 at 11:16pm GMT (automatically)
Length: 1 minute
Affected services
UK-A Core Stack
Created
Feb 25 at 12:54am GMT
We are actively monitoring increased error rates impacting the internet uplinks at our UK-A facility. A support case has been opened with the facility provider.
Details:
We have detected a higher-than-normal level of connectivity issues affecting multiple endpoints. Two brief disruptions of approximately 30 seconds each have been observed within the past hour. No planned changes are currently in progress.
Affected services
UK-A Core Stack