-
Notifications
You must be signed in to change notification settings - Fork 4k
Optimise HTTP API /queues endpoint #9874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ea0adf8
to
f2ab9b4
Compare
f2ab9b4
to
7935e4d
Compare
5k quorum queues on a 3 node dev box. On this branch:
On main:
I expect on a system with a real network and some load to benefit even more. |
7935e4d
to
64494e6
Compare
Listing queues with the HTTP API when there are many (1000s) of quorum queues could be excessively slow compared to the same scenario with classic queues. This optimises various aspects of HTTP API queue listings. For QQs it removes the expensive cluster wide rpcs used to get the "online" status of each quorum queue. This was previously done _before_ paging and thus would perform a cluster-wide query for _each_ quorum queue in the vhost/system. This accounted for most of the slowness compared to classic queues. Secondly the query to separate the running from the down queues consisted of two separate queries that later were combined when a single query would have sufficed. This commit also includes a variety of other improvements and minor fixes discovered during testing and optimisation. MINOR BREAKING CHANGE: quorum queues would previously only display one of two states: running or down. Now there is a new state called minority which is emitted when the queue has at least one member running but cannot commit entries due to lack of quorum. Also the quorum queue may transiently enter the down state when a node goes down and before its elected a new leader.
64494e6
to
c2cd60b
Compare
To allow callers to specify a subset of fields they'd like.
There is no need to list all queues to check if the vhost exists.
Thanks, that should make a big difference. A 10s timeout is workable, a 1m timeout isn't. Any chance of this getting a backport to 3.12? (PS: I've been locked out of my own bugreport. I've been getting more responses there from both of you but can't respond.) |
There is no consensus on whether we want to backport this. #9892 is ready for review but we make no promises. |
It was decided that this should be a 3.13-specific change. |
Yeah I suspected as much from the "breaking change" aspect, that makes sense. Thanks anyway, something to look forward to in 3.13. ;-) |
Can confirm, the long timeout when a node is offline is now fixed in 3.13.0 |
Might be useful to note this as fixed this in #9522 as well. I'd do it myself but I've been locked out of my own bugreport. ;-) |
done |
Listing queues with the HTTP API when there are many (1000s) of
quorum queues could be excessively slow compared to the same scenario
with classic queues.
This optimises various aspects of HTTP API queue listings.
For QQs it removes the expensive cluster wide rpcs used to get the
"online" status of each quorum queue. This was previously done before
paging and thus would perform a cluster-wide query for each quorum queue in
the vhost/system. This accounted for most of the slowness compared to
classic queues.
Secondly the query to separate the running from the down queues
consisted of two separate queries that later were combined when a single
query would have sufficed.
This commit also includes a variety of other improvements and minor
fixes discovered during testing and optimisation.
MINOR BREAKING CHANGE: quorum queues would previously only display one
of two states: running or down. Now there is a new state called minority
which is emitted when the queue has at least one member running but
cannot commit entries due to lack of quorum.
Also the quorum queue may transiently enter the down state when a node
goes down and before its elected a new leader.