Closed
Description
Currently we're executing an expensive query more often than we expect. This caused 3 incidents over the course of the last 2,5 weeks. We still are not sure why it's triggered this often (there are no traces/logs), but still we can improve the situation by:
- DB:
- improve the query itself: our hypothesis is that the ORM generates a query like this
SELECT COUNT(1) FROM (SELECT ... JOIN...)
where the subquery is the reason for the slowness (MySQL tries to materialize the table). Try writing direct SQL ala `SELECT COUNT(1) FROM d_b_workspace_instance AS wsi JOIN d_b_workspace AS ws ON ws.id = wsi.workspaceId WHERE ws.type = 'regular'. Test this against a failover prod DB. - double-check we have an index on
workspace.type
in both prod DBs - API: use different API/HTTP calls/requests for "config" and "telemetry data": e.g., don't execute the queries in case we are not sending the result anyway
- better observability: add tracing to the HTTP endpoint