rabbit_nodes: Add list functions to clarify which nodes we are interested in #7058

dumbbell · 2023-01-26T16:02:25Z

So far, we had the following functions to list nodes in a RabbitMQ cluster:

rabbit_mnesia:cluster_nodes/1 to get members of the Mnesia cluster; the argument was used to select members (all members or only those running Mnesia and participating in the cluster)
rabbit_nodes:all/0 to get all members of the Mnesia cluster
rabbit_nodes:all_running/0 to get all members who currently run Mnesia

Basically:

rabbit_nodes:all/0 calls rabbit_mnesia:cluster_nodes(all)
rabbit_nodes:all_running/0 calls rabbit_mnesia:cluster_nodes(running)

We also have:

rabbit_node_monitor:alive_nodes/1 which filters the given list of nodes to only select those currently running Mnesia
rabbit_node_monitor:alive_rabbit_nodes/1 which filters the given list of nodes to only select those currently running RabbitMQ

Most of the code uses rabbit_mnesia:cluster_nodes/1 or the rabbit_nodes:all*/0 functions. rabbit_mnesia:cluster_nodes(running) or rabbit_nodes:all_running/0 is often used as a close approximation of "all cluster members running RabbitMQ". This list might be incorrect in times where a node is joining the clustered or is being worked on (i.e. Mnesia is running but not RabbitMQ).

With Khepri, there won't be the same possible approximation because we will try to keep Khepri/Ra running even if RabbitMQ is stopped to expand/shrink the cluster.

So in order to clarify what we want when we query a list of nodes, this patch introduces the following functions:

rabbit_nodes:list_members/0 to get all cluster members, regardless of their state
rabbit_nodes:list_reachable/0 to get all cluster members we can reach using Erlang distribution, regardless of the state of RabbitMQ
rabbit_nodes:list_running/0 to get all cluster members who run RabbitMQ, regardless of the maintenance state
rabbit_nodes:list_serving/0 to get all cluster members who run RabbitMQ and are accepting clients

In addition to the list functions, there are the corresponding rabbit_nodes:is_*(Node) checks and rabbit_nodes:filter_*(Nodes) filtering functions.

The code is modified to use these new functions. One possible significant change is that the new list functions will perform RPC calls to query the nodes' state, unlike rabbit_mnesia:cluster_nodes(running).

dumbbell · 2023-02-02T11:25:30Z

I will wait for #6821 to be merged before in order to not interfere with the review and rebase of that branch.

dumbbell · 2023-02-08T13:43:27Z

This patch is ready for review again.

I hope it doesn't introduce any regression, it's difficult to test.

If you think of a better naming, please share!

…sted in So far, we had the following functions to list nodes in a RabbitMQ cluster: * `rabbit_mnesia:cluster_nodes/1` to get members of the Mnesia cluster; the argument was used to select members (all members or only those running Mnesia and participating in the cluster) * `rabbit_nodes:all/0` to get all members of the Mnesia cluster * `rabbit_nodes:all_running/0` to get all members who currently run Mnesia Basically: * `rabbit_nodes:all/0` calls `rabbit_mnesia:cluster_nodes(all)` * `rabbit_nodes:all_running/0` calls `rabbit_mnesia:cluster_nodes(running)` We also have: * `rabbit_node_monitor:alive_nodes/1` which filters the given list of nodes to only select those currently running Mnesia * `rabbit_node_monitor:alive_rabbit_nodes/1` which filters the given list of nodes to only select those currently running RabbitMQ Most of the code uses `rabbit_mnesia:cluster_nodes/1` or the `rabbit_nodes:all*/0` functions. `rabbit_mnesia:cluster_nodes(running)` or `rabbit_nodes:all_running/0` is often used as a close approximation of "all cluster members running RabbitMQ". This list might be incorrect in times where a node is joining the clustered or is being worked on (i.e. Mnesia is running but not RabbitMQ). With Khepri, there won't be the same possible approximation because we will try to keep Khepri/Ra running even if RabbitMQ is stopped to expand/shrink the cluster. So in order to clarify what we want when we query a list of nodes, this patch introduces the following functions: * `rabbit_nodes:list_members/0` to get all cluster members, regardless of their state * `rabbit_nodes:list_reachable/0` to get all cluster members we can reach using Erlang distribution, regardless of the state of RabbitMQ * `rabbit_nodes:list_running/0` to get all cluster members who run RabbitMQ, regardless of the maintenance state * `rabbit_nodes:list_serving/0` to get all cluster members who run RabbitMQ and are accepting clients In addition to the list functions, there are the corresponding `rabbit_nodes:is_*(Node)` checks and `rabbit_nodes:filter_*(Nodes)` filtering functions. The code is modified to use these new functions. One possible significant change is that the new list functions will perform RPC calls to query the nodes' state, unlike `rabbit_mnesia:cluster_nodes(running)`.

michaelklishin · 2023-02-14T02:06:18Z

It would greatly simplify backports if key functions of this API that can be aliases to the ones they replace in main, were added to v3.11.x and v3.10.x.

michaelklishin · 2023-02-14T16:31:26Z

#7282 is an example of a small change that failed to backport because of rabbit_db_*

dumbbell · 2023-02-15T10:03:15Z

Perhaps it should not be backported in the first place? It doesn't look like a bug fix to me.

…e/0` ... instead of using an internal implementation. References #7058.

... instead of using an internal implementation. The plugins are `rabbitmq_peer_discovery_common` and `rabbitmq_sharding`. References #7058.

…ic code [Why] The partition detection code defines a partitioned node as an Erlang node running RabbitMQ but which is not among the Mnesia running nodes. Since #7058, `rabbit_node_monitor` uses the list functions exported by `rabbit_nodes` for everything, except the partition detection code which is Mnesia-specific and relies on `rabbit_mnesia:cluster_nodes/1`. Unfortunately, we saw regressions in the Jepsen testsuite during the 3.12.0 release cycle only because that testsuite is not executed on `main`. It happens that the partition detection code is using `rabbit_nodes` list functions in two places where it should have continued to use `rabbit_mnesia`. [How] The fix bug fix simply consists of reverting the two calls to `rabbit_nodes` back to calls to `rabbit_mnesia` as it used to do. This seems to improve the situation a lot in the manual testing. This code will go away with our use of Mnesia in the future, so it's not a problem to call `rabbit_mnesia` here.

…ic code [Why] The partition detection code defines a partitioned node as an Erlang node running RabbitMQ but which is not among the Mnesia running nodes. Since #7058, `rabbit_node_monitor` uses the list functions exported by `rabbit_nodes` for everything, except the partition detection code which is Mnesia-specific and relies on `rabbit_mnesia:cluster_nodes/1`. Unfortunately, we saw regressions in the Jepsen testsuite during the 3.12.0 release cycle only because that testsuite is not executed on `main`. It happens that the partition detection code is using `rabbit_nodes` list functions in two places where it should have continued to use `rabbit_mnesia`. [How] The fix bug fix simply consists of reverting the two calls to `rabbit_nodes` back to calls to `rabbit_mnesia` as it used to do. This seems to improve the situation a lot in the manual testing. This code will go away with our use of Mnesia in the future, so it's not a problem to call `rabbit_mnesia` here. (cherry picked from commit 22516aa)

dumbbell added this to the 3.12.0 milestone Jan 26, 2023

dumbbell self-assigned this Jan 26, 2023

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch 2 times, most recently from 18156b3 to 7c19e23 Compare January 27, 2023 16:12

dumbbell marked this pull request as ready for review January 30, 2023 09:46

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch from 7c19e23 to 5ca86d9 Compare January 30, 2023 09:48

michaelklishin previously approved these changes Feb 2, 2023

View reviewed changes

dumbbell dismissed michaelklishin’s stale review via c87d10b February 2, 2023 09:44

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch 2 times, most recently from c87d10b to ff6a4ac Compare February 2, 2023 10:19

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch 2 times, most recently from e2ab72b to 1c2c156 Compare February 6, 2023 11:07

dumbbell marked this pull request as draft February 7, 2023 13:27

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch 4 times, most recently from ee94e5e to 1d7db96 Compare February 8, 2023 10:59

dumbbell marked this pull request as ready for review February 8, 2023 13:42

dumbbell mentioned this pull request Feb 8, 2023

rabbit_db: Add clustering-related functions #6882

Merged

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch 5 times, most recently from 77a0574 to 6a9e057 Compare February 13, 2023 10:41

dumbbell force-pushed the add-node-lists-functions-to-clarify-intent branch from 6a9e057 to d656371 Compare February 13, 2023 11:59

michaelklishin approved these changes Feb 14, 2023

View reviewed changes

michaelklishin merged commit d0dc951 into main Feb 14, 2023

michaelklishin deleted the add-node-lists-functions-to-clarify-intent branch February 14, 2023 02:06

dumbbell added a commit that referenced this pull request Feb 21, 2023

rabbitmq_peer_discovery_common: Use the `rabbit_nodes:list_unreachabl…

dd3c356

…e/0` ... instead of using an internal implementation. References #7058.

dumbbell added a commit that referenced this pull request Feb 21, 2023

rabbitmq_peer_discovery_common: Use the `rabbit_nodes:list_unreachabl…

e74146a

…e/0` ... instead of using an internal implementation. References #7058.

dumbbell added a commit that referenced this pull request Feb 21, 2023

Use rabbit_nodes to list nodes in two plugins

21cff15

... instead of using an internal implementation. The plugins are `rabbitmq_peer_discovery_common` and `rabbitmq_sharding`. References #7058.

dumbbell added a commit that referenced this pull request Feb 21, 2023

Use rabbit_nodes to list nodes in two plugins

f30b43c

... instead of using an internal implementation. The plugins are `rabbitmq_peer_discovery_common` and `rabbitmq_sharding`. References #7058.

dumbbell added a commit that referenced this pull request Feb 21, 2023

Use rabbit_nodes to list nodes in two plugins

fab106b

... instead of using an internal implementation. The plugins are `rabbitmq_peer_discovery_common` and `rabbitmq_sharding`. References #7058.

dumbbell added a commit that referenced this pull request Feb 22, 2023

Use rabbit_nodes to list nodes in two plugins

cd2b4c7

... instead of using an internal implementation. The plugins are `rabbitmq_peer_discovery_common` and `rabbitmq_sharding`. References #7058.

This was referenced Mar 8, 2023

Fix definitions import for 3.12.x #7533

Closed

Version 3.12 does not import a definitions file correctly #7532

Closed

dumbbell mentioned this pull request Mar 21, 2023

rabbit_node_monitor: Use rabbit_mnesia in partition handling-specific code #7686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rabbit_nodes: Add list functions to clarify which nodes we are interested in #7058

rabbit_nodes: Add list functions to clarify which nodes we are interested in #7058

Uh oh!

dumbbell commented Jan 26, 2023 •

edited

Loading

Uh oh!

dumbbell commented Feb 2, 2023

Uh oh!

dumbbell commented Feb 8, 2023

Uh oh!

michaelklishin commented Feb 14, 2023

Uh oh!

michaelklishin commented Feb 14, 2023

Uh oh!

dumbbell commented Feb 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rabbit_nodes: Add list functions to clarify which nodes we are interested in #7058

rabbit_nodes: Add list functions to clarify which nodes we are interested in #7058

Uh oh!

Conversation

dumbbell commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dumbbell commented Feb 2, 2023

Uh oh!

dumbbell commented Feb 8, 2023

Uh oh!

michaelklishin commented Feb 14, 2023

Uh oh!

michaelklishin commented Feb 14, 2023

Uh oh!

dumbbell commented Feb 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dumbbell commented Jan 26, 2023 •

edited

Loading