Skip to content

Conversation

@jmcarp
Copy link
Contributor

@jmcarp jmcarp commented Nov 7, 2025

We use qorb as the connection pooler for ClickHouse within oximeter, and inherit the default pool max_slots of 16. We may be saturating that pool size for some OxQL workloads, such as the proposed otel receiver, which runs many tiny queries in parallel against ClickHouse. This patch makes the pool size configurable, so that we can adjust it if necessary.

We use qorb as the connection pooler for ClickHouse within oximeter, and
inherit the default pool max_slots of 16. We may be saturating that pool size
for some OxQL workloads, such as the proposed otel receiver, which runs many
tiny queries in parallel against ClickHouse. This patch makes the pool size
configurable, so that we can adjust it if necessary.
@jmcarp jmcarp requested a review from bnaecker November 7, 2025 17:07
};
let mut timeseries_policy = Policy::default();
if let Some(max_slots) = config.pkg.timeseries_db.max_slots {
timeseries_policy.max_slots = max_slots;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the maximum across all backends. Is that what you want to cap? Or a maximum for each backend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I was thinking. I think I have queries queueing up when I run the otel receiver against oximeter, and I wanted to see if increasing the connection pool size might help. I don't have a good mental model of qorb—are there multiple backends when we're managing a connection pool for a single database instance, as we are here? For my particular use case, I'm interested in bumping whichever cap is throttling my queries, but maybe we should add knobs for both caps for generality.

As an aside, is there a simple way to check whether we're saturating either the policy or backend's max connections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just recently switched back to single-node ClickHouse, in which case there is one backend and so the total cap on slots and the count per backend are the same. When we go back to multinode, we probably want to configure this on a per-backend basis.

As an aside, is there a simple way to check whether we're saturating either the policy or backend's max connections?

I'd probably use the USDT probes to do this. For example, if there is substantial time between claim-start and claim-done, then the connections are all in use since we're spending time queued. You could also use handle-claimed and handle-returned to estimate the spare capacity in the pool over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants