Commercial Customers: TCP timeouts from Influx Cloud API
Marking as resolved and keeping an eye on logs for future timeout optimizations.
2023-01-13 13:44:58 UTC
Web application Net::Timeout's have subsided. We will keep an eye on the logs and also tweak both `open_timeout` and `read_timeout` as they relate to Net::HTTP's (used by the Influx API gem) settings.
We will also watch the haproxy logs which have now had their interval time increased and look to be running clear.
2023-01-13 12:38:30 UTC
Influx engineers report no issues on their end. We are increasing our timeouts on our web application API calls and proxy health checks to Influx in an effort to address this.
2023-01-13 11:22:47 UTC
We are seeing TCP timeouts on Influx commercial API endpoint infrastructure in AWS Oregon https://cloud2.influxdata.com (https://us-west-2-2.aws.cloud2.influxdata.com/). This affects paid users on our web application "app.pansift.com" and also their agent ingestion to "ingest5" which both leverage the Influx cloud. All data is buffered, no data is lost, yet this is an intermittent transport issue affecting API calls and manifesting as "TSDB" errors on the dashboard. Reloading the dashboard should clear any TSDB errors on the web application, and agents will buffer unwritten data.
Detail: We have logging showing intermittent `Net::Timeout`s from our web app dashboard (using the Influx API). Additionally our reverse proxy ongoing standard Layer 4 checks to the Influx commercial cloud are failing intermittently. Both our hosts run on DigitalOcean but reside in San Francisco and Amsterdam and both are seeing issues facing Influx's `us-west-2-2.aws.cloud2`. It may be a transit issue (from one ASN [DigitalOcean: AS14061] to another [AWS: AS16509]) yet we are investigating including opening support tickets with the vendor.