Skip to content

Consider collation_id support #17

@Totktonada

Description

@Totktonada

Status

Let's consider this issue as a proposal for discussion.

The idea

I'll cite README.md:

Differences to the bad side

Let's name it backlog :)

  • <...>
  • <collation_id> option is removed, use <collation> instead.

My thought was that when <collation> is present, there is no need to use <collation_id>. However there are cases, when an index definition does not contain <collation> (only <collation_id>) and a user should resolve the numeric ID to a collation name somehow. This proposal is to reduce complexity for a user and resolve <collation_id> to <collation> on the module side. There is the comment on this topic and I interpret it as the request for this functionality.

The cases, when we have only numeric collation IDs, are the following:

  • tarantool-1.10 server (storage) and tarantool-1.10/2.* client (net.box).
  • tarantool-2.* server (storage) and tarantool-1.10 client (net.box).

Technical details

Let me summarize differences between 1.10 and 2.* that are valuable in context of this task:

  • Tarantool-2.* exposes _vcollation system view1, which is available for all users2 (unlike _collation system space).
  • The net.box client on tarantool-2.* fetches _vcollation view from a server (if exists) and replaces <collation_id> field in index key parts with <collation> field.
  • Tarantool-1.10 has only 'unicode' and 'unicode_ci' collations by default, while tarantool-2.* provides much more collations by default. However, collations could be added by a user.

Possible solutions

(1) Lookup by _vcollation, then _collation

When <collation_id> is passed to tuple_keydef.new(), try to resolve it using _vcollation first, _collation then.

We should highlight that _collation access should be explicitly granted to execute such tuple_keydef.new() call on 1.10:

box.schema.user.grant(username, 'read', 'space', '_collation')

We should highlight that collation list on a client (or wherever the tuple_keydef.new() call is executed) should be in sync with collations that we can see in the key parts (usually from server's index definitions).

If we can meet collations aside of 'unicode' and 'unicode_ci' (say, default ones from tarantool-2.*) and the tuple_keydef instance is created on 1.10, we should insert all required collations to the _collation system space.

(2) Do nothing

...and declare that a user should use extra source of information about collation name in an index definition if net.box does not provide it: say, consume a schema from tarantool/ddl.

The note above about collations aside of 'unicode' and 'unicode_ci' and tarantool-1.10 is applicable here.

(3, 4, 5) Other (bad) options

  • (3) Just look in _vcollation — no 1.10 support, so no sense.
  • (4) Just look in _collation — requires unnecessary permissions on tarantool-2.*, undesirable.
  • (5) Hardcode 'unicode' and 'unicode_ci' — partial solution, may be confusing and may hit us in a future.

Which solution is the best?

I don't like (3), (4) and (5) — see reasons above. But (1) and (2) looks more or less okay. I would collect more opinions to decide.

Summon @olegrok.

Code examples

May be useful if we'll decide to implement (1).

  • coll_id_by_name(). We need a reverse mapping, so just use index by IDs instead of index by names.

Footnotes

Footnotes

  1. Strictly speaking, _vcollation exists for tarantool-2.*, only when it works on a recent schema, not on top of the schema from 1.10.

  2. Strictly speaking, the access for _vcollation is granted for users, which have the 'public' role, which is assigned for all user by default, but could be revoked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions