-
Notifications
You must be signed in to change notification settings - Fork 8
Rendezvous hashing filesystem cache #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is an automated comment for commit 6092c08 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Rendezvous
part is mostly ok, just some small questions.
I am just a bit concerned because ClickHouse#77326 hasn't been merged yet and I believe some commits are missing. Do you plan to cherry-pick the most recent ones as well?
src/Storages/IStorageCluster.cpp
Outdated
std::vector<std::string> ids_of_hosts; | ||
for (const auto & shard : cluster->getShardsInfo()) | ||
{ | ||
if (shard.per_replica_pools.size() < 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per_replica_pools.empty()
|
||
size_t StorageObjectStorageStableTaskDistributor::getReplicaForFile(const String & file_path) | ||
{ | ||
if (!ids_of_nodes.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case would it not have a value? And why is empty != nullopt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must always have value, in ReadFromCluster::createExtension
it is called with ids_of_nodes
always.
About nullopt - it is the same:
if (var)
if (var.has_value())
if (var != std::nullopt)
First and second are synonyms - https://en.cppreference.com/w/cpp/utility/optional/operator_bool
Third - because comparation with nullopt calls bool operator:
/usr/include/c++/11$ grep -B 5 -A 5 'operator==' optional
...
// Comparisons with nullopt.
template<typename _Tp>
constexpr bool
operator==(const optional<_Tp>& __lhs, nullopt_t) noexcept
{ return !__lhs; }
...
template<typename _Tp>
constexpr bool
operator==(nullopt_t, const optional<_Tp>& __rhs) noexcept
{ return !__rhs; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't explain my question correctly. I was asking why not just have a vector and check if it is empty instead of an optional of vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that instead of doing:
std::optional<std::vector<std::string>> ids_of_nodes;
...
if (!ids_of_nodes.has_value())
you simply do:
std::vector<std::string> ids_of_nodes;
...
if (ids_of_nodes.empty())
I mean, it is ok to keep it as optional, just thought it could be simpler
return 0; | ||
|
||
/// Trivial case | ||
if (ids_of_nodes.value().size() < 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: extract ids_of_nodes into a non-optonal to avoid de-referencing all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional in IStorageCluster
because all except StorageObjectStorageCluster
do not use it.
To make it non-optional need to make a copy of list inside class. Or use smart pointer, than need to check if pointer has value.
Remove some dereferencings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you just do the below inside this method?
auto ids_of_nodes = ids_of_nodes_optional.value_or_die()
and use ids_of_nodes
instead of repeating ids_of_nodes.value()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you updated the PR with that here: 46fe767#diff-68aa420a604c13765878c1fb39270f50ae0757b9b2f1b6609743632d2c7d0770R44.
that's ok then
/// Rendezvous hashing | ||
size_t best_id = 0; | ||
UInt64 best_weight = sipHash64(ids_of_nodes.value()[0] + file_path); | ||
for (size_t id = ids_of_nodes.value().size() - 1; id > 0; --id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reverse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to direct order to avoid confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It might be a good idea to add a setting to turn on/ off this behavior just to be on the safe side?
But it looks ok as it is.
6092c08
to
763c860
Compare
95a407b
to
1c76962
Compare
1c76962
to
d6198a4
Compare
…system-cache Rendezvous hashing filesystem cache
Improvement object storage cache locality (#708) with rendezvous hashing.
https://en.wikipedia.org/wiki/Rendezvous_hashing
Main change in StorageObjectStorageStableTaskDistributor::getReplicaForFile method.
With original code distribution is not stable when host from beginning of cluster node list is gone or when cluster changes node order by some reason.
With rendezvous hashing best node based on node address (host:port) instead of node number. But a little bit heavy - need do calculate hash N times, one for each node.