Description
Map-reduce in vshard: vshard.router.map_callrw()
Product: Tarantool, vshard (external module)
Since: 0.1.17 (vshard version; vshard is compatible with Tarantool versions >= 1.9)
Audience/target: developers; Tarantool implementation teams
Root document: https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_index/
https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_api/#vshard-api-reference-router-public-api
SME: @ Gerold103
Peer reviewer: @
Details
vshard.router.map_callrw()
implements consistent map-reduce over
the entire cluster. Consistency means all the data was accessible,
and didn't move during map requests execution.
It is useful when need to access potentially all the data in the
cluster or simply huge number of buckets scattered over the
instances and whose individual vshard.router.call()
would take
too long.
Map_callrw()
takes name of the function to call on the storages,
arguments in the format of array, and not required options map.
The only supported option for now is timeout which is applied to
the entire call. Not to individual calls for each storage.
vshard.router.map_callrw(func_name, args[, {timeout = <seconds>}])
The chosen function is called on the master node of each
replicaset with the given arguments.
In case of success vshard.router.map_callrw()
returns a map with
replicaset UUIDs as keys and results of the user's function as
values, like this:
{uuid1 = {res1}, uuid2 = {res2}, ...}
If the function returned nil
or box.NULL
from one of the
storages, it won't be present in the result map.
In case of fail it returns nil, error object, and optional
replicaset UUID where the error happened. UUID may not be returned
if the error wasn't about a concrete replicaset.
For instance, the method fails if not all buckets were found even
if all replicasets were scanned successfully.
Handling the result looks like this:
res, err, uuid = vshard.router.map_callrw(...)
if not res then
-- Error.
-- 'err' - error object. 'uuid' - optional UUID of replicaset
-- where the error happened.
...
else
-- Success.
for uuid, value in pairs(res) do
...
end
end
Map-Reduce in vshard works in 3 stages: Ref, Map, Reduce. Ref is
an internal stage which is supposed to ensure data consistency
during user's function execution on all nodes.
Reduce is not performed by vshard. It is what user's code does
with results of map_callrw()
.
Consistency, as it is defined for map-reduce, is not compatible
with rebalancing. Because any bucket move would make the sender
and receiver nodes 'inconsistent' - it is not possible to call a
function on them which could simply access all the data without
doing vshard.storage.bucket_ref()
.
This makes Ref stage very intricate as it must work together with
rebalancer to ensure neither of them block each other.
For this storage has a scheduler specifically for bucket moves and
storage refs which shares storage time between them fairly.
Definition of fairness depends on how long and frequent the moves
and refs are. This can be configured using storage options
sched_move_quota
and sched_ref_quota
. See more details about
them in the corresponding doc section.
The scheduler configuration may affect map-reduce requests if they
are used a lot during rebalancing.
Keep in mind that it is not a good idea to use too big timeouts
for map_callrw()
. Because the router will try to block the
bucket moves for the given timeout on all storages. And in case
something will go wrong, the block will remain for the entire
timeout. This means, in particular, having the timeout longer
than, say, minutes is a super bad way to go unless it is for
tests only.
Also it is important to remember that map_callrw()
does not
work on replicas. It works only on masters. This makes it unusable
if at least one replicaset has its master node down.
- Related development issues and/or commits: tarantool/vshard@a394e3f
Definition of done
- Create a new page with the description of map-reduce implementation in
vshard
- Restructure API Reference page