Brief Web Application Outage : Agents and Ingestion Unaffected

Resolved in a minute

Incident timeline

Partial outage
Resolved
Impact: No impact to metrics, agents, or ingestion but there was a ~70s outage of the web application portion of PanSift (i.e. the dashboards and agent graphs), due to a failed restart of the `puma` webserver on a recent deploy. This was resolved almost immediately once a legacy controller file was fully removed. Details: The latest deployment temporarily prevented the `puma` web server from restarting due to a renamed and deprecated file. This prevented the `zeitwork` gem from completing a successful eager loading of the latest code and thus could not start to serve web pages on a restart. This was not caught in local development or staging (where puma configs are identical) and only manifested in production. This was immediately detected by Honeybadger and via the operator on the command line (albeit deploys are automated).
2023-01-27 13:51:00 UTC - 10 months