Skip to content

[telemetry] Fix telemetry causing high DB CPU load #8638

Closed
@geropl

Description

@geropl

Currently we're executing an expensive query more often than we expect. This caused 3 incidents over the course of the last 2,5 weeks. We still are not sure why it's triggered this often (there are no traces/logs), but still we can improve the situation by:

  1. DB:
  2. improve the query itself: our hypothesis is that the ORM generates a query like this SELECT COUNT(1) FROM (SELECT ... JOIN...) where the subquery is the reason for the slowness (MySQL tries to materialize the table). Try writing direct SQL ala `SELECT COUNT(1) FROM d_b_workspace_instance AS wsi JOIN d_b_workspace AS ws ON ws.id = wsi.workspaceId WHERE ws.type = 'regular'. Test this against a failover prod DB.
  3. double-check we have an index on workspace.type in both prod DBs
  4. API: use different API/HTTP calls/requests for "config" and "telemetry data": e.g., don't execute the queries in case we are not sending the result anyway
  5. better observability: add tracing to the HTTP endpoint

/cc @corneliusludmann

Metadata

Metadata

Assignees

Labels

team: deliveryIssue belongs to the self-hosted teamtype: bugSomething isn't working

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions