Skip to content

vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32() #1153

@TarantoolBot

Description

@TarantoolBot

vshard.router.bucket_id() is deprecated, each its usage logs a
warning. It still works, but will be deleted in future.

Behaviour of the old bucket_id() function is now available as
vshard.router.bucket_id_strcrc32(). It works exactly like the old
function, but does not log a warning.

The reason why there is a new function bucket_id_mpcrc32() is that
the old bucket_id() and the new bucket_id_strcrc32() are not
consistent for cdata numbers. In particular, they return 3
different values for normal Lua numbers like 123, for unsigned
long long cdata (like 123ULL, or ffi.cast('unsigned long long',
123)), and for signed long long cdata (like 123LL, or
ffi.cast('long long', 123)). Note, this is important!

    vshard.router.bucket_id(123)
    vshard.router.bucket_id(123LL)
    vshard.router.bucket_id(123ULL)

    Return 3 different values!!!

For float and double cdata (ffi.cast('float', number),
ffi.cast('double', number)) these functions return different
values even for the same numbers of the same floating point type.
This is because tostring() on a floating point cdata number
returns not the number, but a pointer at it. Different on each
call.

vshard.router.bucket_id_strcrc32() behaves exactly the same, but
does not log a warning. In case you need that behaviour.

vshard.router.bucket_id_mpcrc32() is safer. It takes a CRC32 from
MessagePack encoded value. That is, bucket_id of integers does not
depend on their Lua type. However it still may return different
values for not equal floating point types. That is,
ffi.cast('float', number) may be reflected onto a bucket id not
equal to ffi.cast('double', number). This can't be fixed, because
a float value, even being casted to double, may have a garbage
tail in its fraction.

Floating point keys should not be used to calculate a bucket id,
usually.

P.S. #1: bucket_id_mpcrc32() in case of a string key does not
encode it into MessagePack, but takes hash right from the string.
This does not affect consistency of the function, but makes it as
fast as bucket_id_strcrc32().

P.S. #2: be very careful in case you store floating point types in
a space. When data is returned from a space, it is cased to Lua
number. And if that value had empty fraction part, it will be
treated as integer by bucket_id_mpcrc32(). So you need to do
explicit casts in such cases. Example of the problem:

s = box.schema.create_space('test', {format = {{'id', 'double'}}})
_ = s:create_index('pk')

inserted = ffi.cast('double', 1)

-- Value is stored as double.
s:replace({inserted})

-- But when returned to Lua, stored as Lua number, not cdata.
returned = s:get({inserted}).id
type(returned), returned
---
- number
- 1
...

vshard.router.bucket_id_mpcrc32(inserted)
---
- 1411
...
vshard.router.bucket_id_mpcrc32(returned)
---
- 1614
...

Requested by @Gerold103 in tarantool/vshard@b035fd4.

Metadata

Metadata

Assignees

Labels

featureA new functionalityreference[location] Tarantool manual, Reference part

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions