Web Application Outage

Resolved in 6 hours

Incident timeline

Operational
Resolved
Marking as resolved. Continuing to investigate and to identify/confirm for root cause for mitigations to be put in place.
2023-04-22 08:17:22 UTC - a month
Operational
Investigating
Investigating aggressive vulnerability scanning from the Internet which may have caused Puma/Rack to stop responding. NGINX was serving throughout. Rails continued to process background jobs, but Puma/Rack seemed to stop serving content. HoneyBadger alerts and the Rails production log show non-platform vulnerability-specific scanning which is a common occurrence on the Internet (we encounter it constantly) so Puma/Rack failing to continue serving is unusual. Continuing to investigate.
2023-04-22 08:09:12 UTC - a month
Operational
Investigating
The problem was fixed with a hard Puma restart (had to manually kill processes). Investigating.
2023-04-22 06:46:00 UTC - a month
Operational
Resolved
Marking as resolved while continuing to investigate.
2023-04-22 06:46:00 UTC - a month
Problem detected
Investigating
The main PanSift application running Ruby on Rails stopped responding at 2023-04-22 03:29:46 UTC. A restart of the Rails service fixed the issue at 2023-04-22 07:46:04 UTC. We are investigating now. We apologise for the 4h 16m 17s outage. No agent data was affected or lost, as the ingestion tier is separate from the web tier. Only the ability to view data was affected.
2023-04-22 02:29:00 UTC - a month