Skip to content

Commit 179adca

Browse files
patiencedaurxuniqsergepetrenko
authored
Rewrite iproto protocol description (#3151)
Resolves #1662 Resolves #2422 Resolves #2442 Resolves #2467 Resolves #2526 Part of #2416 * Split the Binary protocol page into several sections * Move examples to How-to * Correct the description of the greeting Resolves #2467 * Elaborate on IPROTO_OPS and the different uses of IPROTO_TUPLE * Provide missing info on keys * Clarify Replication items in terms of whether they are a request, response, map, key, etc. Groom the structure accordingly * Add PROMOTE and DEMOTE descriptions * Bring all SQL-related info into one document * Clarify that IPROTO_REQUEST_TYPE is used in requests and responses alike * Improve formatting * Add tables for uniformity * Add UML diagram illustrations in the SVG format Co-authored-by: Kseniia Antonova <[email protected]> Co-authored-by: Serge Petrenko <[email protected]>
1 parent f72554c commit 179adca

File tree

121 files changed

+7492
-2950
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+7492
-2950
lines changed

doc/dev_guide/internals/box_protocol.rst

Lines changed: 29 additions & 2098 deletions
Large diffs are not rendered by default.

doc/dev_guide/internals/file_formats.rst

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
.. _internals-data_persistence:
22

3-
--------------------------------------------------------------------------------
43
File formats
5-
--------------------------------------------------------------------------------
4+
============
65

76
.. _internals-wal:
87

9-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
Data persistence and the WAL file format
11-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9+
----------------------------------------
1210

1311
To maintain data persistence, Tarantool writes each data change request (insert,
1412
update, delete, replace, upsert) into a write-ahead log (WAL) file in the
@@ -114,9 +112,8 @@ a secondary key, the record in the .xlog file will contain the primary key.
114112

115113
.. _internals-snapshot:
116114

117-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
118115
The snapshot file format
119-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116+
------------------------
120117

121118
The format of a snapshot .snap file is nearly the same as the format of a WAL .xlog file.
122119
However, the snapshot header differs: it contains the instance's global unique identifier
@@ -131,3 +128,43 @@ and ``_cluster`` -- will be at the start of the .snap file, before the records o
131128
any spaces that were created by users.
132129

133130
Secondarily, the .snap file's records are ordered by primary key within space id.
131+
132+
.. _box_protocol-xlog:
133+
134+
Example
135+
-------
136+
137+
The header of a ``.snap`` or ``.xlog`` file looks like:
138+
139+
.. code-block:: none
140+
141+
<type>\n SNAP\n or XLOG\n
142+
<version>\n currently 0.13\n
143+
Server: <server_uuid>\n where UUID is a 36-byte string
144+
VClock: <vclock_map>\n e.g. {1: 0}\n
145+
\n
146+
147+
After the file header come the data tuples.
148+
Tuples begin with a row marker ``0xd5ba0bab`` and
149+
the last tuple may be followed by an EOF marker
150+
``0xd510aded``.
151+
Thus, between the file header and the EOF marker, there
152+
may be data tuples that have this form:
153+
154+
.. code-block:: none
155+
156+
0 3 4 17
157+
+-------------+========+============+===========+=========+
158+
| | | | | |
159+
| 0xd5ba0bab | LENGTH | CRC32 PREV | CRC32 CUR | PADDING |
160+
| | | | | |
161+
+-------------+========+============+===========+=========+
162+
MP_FIXEXT2 MP_INT MP_INT MP_INT ---
163+
164+
+============+ +===================================+
165+
| | | |
166+
| HEADER | | BODY |
167+
| | | |
168+
+============+ +===================================+
169+
MP_MAP MP_MAP
170+
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
.. _box_protocol-authentication:
2+
3+
Session start and authentication
4+
================================
5+
6+
Every iproto session begins with a greeting and optional authentication.
7+
8+
Greeting message
9+
----------------
10+
11+
When a client connects to the server instance, the instance responds with
12+
a 128-byte text greeting message, not in MsgPack format:
13+
14+
.. code-block:: none
15+
16+
Tarantool <version> (<protocol>) <instance-uuid>
17+
<salt>
18+
19+
For example:
20+
21+
.. code-block:: none
22+
23+
Tarantool 2.10.0 (Binary) 29b74bed-fdc5-454c-a828-1d4bf42c639a
24+
QK2HoFZGXTXBq2vFj7soCsHqTo6PGTF575ssUBAJLAI=
25+
26+
The greeting contains two 64-byte lines of ASCII text.
27+
Each line ends with a newline character (:code:`\n`). If the line content is less than 64 bytes long,
28+
the rest of the line is filled up with symbols with an ASCII code of 0 that aren't displayed in the console.
29+
30+
The first line contains
31+
the instance version and protocol type. The second line contains the session salt --
32+
a base64-encoded random string, which is usually 44 bytes long.
33+
The salt is used in the authentication packet -- the :ref:`IPROTO_AUTH message <box_protocol-auth>`.
34+
35+
.. _box_protocol-authentication_sequence:
36+
37+
Authentication
38+
--------------
39+
40+
If authentication is skipped, then the session user is ``'guest'``
41+
(the ``'guest'`` user does not need a password).
42+
43+
If authentication is not skipped, then at any time an :ref:`authentication packet <box_protocol-auth>`
44+
can be prepared using the greeting, the user's name and password,
45+
and `sha-1 <https://en.wikipedia.org/wiki/SHA-1>`_ functions, as follows.
46+
47+
.. code-block:: none
48+
49+
PREPARE SCRAMBLE:
50+
51+
size_of_encoded_salt_in_greeting = 44;
52+
size_of_salt_after_base64_decode = 32;
53+
/* sha1() will only use the first 20 bytes */
54+
size_of_any_sha1_digest = 20;
55+
size_of_scramble = 20;
56+
57+
prepare 'chap-sha1' scramble:
58+
59+
salt = base64_decode(encoded_salt);
60+
step_1 = sha1(password);
61+
step_2 = sha1(step_1);
62+
step_3 = sha1(first_20_bytes_of_salt, step_2);
63+
scramble = xor(step_1, step_3);
64+
return scramble;
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
.. _internals-events:
2+
.. _box-protocol-watchers:
3+
4+
Events and subscriptions
5+
========================
6+
7+
The commands below support asynchronous server-client notifications signalled
8+
with :ref:`box.broadcast() <box-broadcast>`.
9+
Servers that support the new feature set the ``IPROTO_FEATURE_WATCHERS`` feature in reply to the ``IPROTO_ID`` command.
10+
When the connection is closed, all watchers registered for it are unregistered.
11+
12+
The remote watcher (event subscription) protocol works in the following way:
13+
14+
#. The client sends an ``IPROTO_WATCH`` packet to subscribe to the updates of a specified key defined on the server.
15+
16+
#. The server sends an ``IPROTO_EVENT`` packet to the subscribed client after registration.
17+
The packet contains the key name and its current value.
18+
After that, the packet is sent every time the key value is updated with
19+
``box.broadcast()``, provided that the last notification was acknowledged (see below).
20+
21+
#. After receiving the notification, the client sends an ``IPROTO_WATCH`` packet to acknowledge the notification.
22+
23+
#. If the client doesn't want to receive any more notifications, it unsubscribes by sending
24+
an ``IPROTO_UNWATCH`` packet.
25+
26+
All the three request types are asynchronous -- the receiving end doesn't send a packet in reply to any of them.
27+
Therefore, neither of them has a sync number.
28+
29+
.. _box_protocol-watch:
30+
31+
IPROTO_WATCH
32+
------------
33+
34+
Code: 0x4a.
35+
36+
Register a new watcher for the given notification key or confirms a notification if the watcher is
37+
already subscribed.
38+
The watcher is notified after registration.
39+
After that, the notification is sent every time the key is updated.
40+
The server doesn't reply to the request unless it fails to parse the packet.
41+
42+
.. raw:: html
43+
:file: images/events_watch.svg
44+
45+
.. _box_protocol-unwatch:
46+
47+
IPROTO_UNWATCH
48+
--------------
49+
50+
Code: 0x4b.
51+
52+
Unregister a watcher subscribed to the given notification key.
53+
The server doesn't reply to the request unless it fails to parse the packet.
54+
55+
.. raw:: html
56+
:file: images/events_unwatch.svg
57+
58+
.. _box_protocol-event:
59+
60+
IPROTO_EVENT
61+
------------
62+
63+
Code: 0x4c.
64+
65+
Sent by the server to notify a client about an update of a key.
66+
67+
.. raw:: html
68+
:file: images/event.svg
69+
70+
``IPROTO_EVENT_DATA`` contains data sent to a remote watcher.
71+
The parameter is optional, the default value is ``MP_NIL``.
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
.. _internals-iproto-format:
2+
3+
Request and response format
4+
===========================
5+
6+
The types referred to in this document are `MessagePack <http://MessagePack.org>`_ types.
7+
For their definitions, see the :ref:`MP_* MessagePack types <box_protocol-notation>` section.
8+
9+
.. _internals-unified_packet_structure:
10+
11+
Packet structure
12+
----------------
13+
14+
Requests and responses have similar structure. They contain three sections: size, header, and body.
15+
16+
.. raw:: html
17+
:file: images/format.svg
18+
19+
It is legal to put more than one request in a packet.
20+
21+
Size
22+
----
23+
24+
The size is an MP_UINT -- unsigned integer, usually 32-bit.
25+
It the size of the header plus the size of the body.
26+
It may be useful to compare it with the number of bytes remaining in the packet.
27+
28+
.. _box_protocol-header:
29+
30+
Header
31+
------
32+
33+
The header is an MP_MAP. It may contain, in any order:
34+
35+
.. raw:: html
36+
:file: images/header.svg
37+
38+
* Both the request and response make use of the :ref:`IPROTO_REQUEST_TYPE <internals-iproto-keys-request_type>` key.
39+
It denotes the type of the packet.
40+
41+
* The request and the matching response have the same sync number (:ref:`IPROTO_SYNC <internals-iproto-keys-sync>`).
42+
43+
* :ref:`IPROTO_SCHEMA_VERSION <internals-iproto-keys-schema_version>` is an optional key that indicates
44+
whether there was a major change in the schema.
45+
46+
* In :ref:`interactive transactions <txn_mode_stream-interactive-transactions>`,
47+
every stream is identified by a unique :ref:`IPROTO_STREAM_ID <box_protocol-iproto_stream_id>`.
48+
49+
In case of replicating :ref:`synchronous transactions <repl_sync>`,
50+
the header also contains the :ref:`IPROTO_FLAGS <box_protocol-flags>` key.
51+
52+
Encoding and decoding
53+
~~~~~~~~~~~~~~~~~~~~~
54+
55+
To see how Tarantool encodes the header, have a look at file
56+
`xrow.c <https://github.com/tarantool/tarantool/blob/master/src/box/xrow.c>`_,
57+
function ``xrow_header_encode``.
58+
59+
To see how Tarantool decodes the header, have a look at file
60+
`net_box.c <https://github.com/tarantool/tarantool/blob/master/src/box/lua/net_box.c>`__,
61+
function ``netbox_decode_data``.
62+
63+
For example, in a successful response to ``box.space:select()``,
64+
the IPROTO_REQUEST_TYPE value will be 0 = ``IPROTO_OK`` and the
65+
array will have all the tuples of the result.
66+
67+
Read the source code file `net_box.c <https://github.com/tarantool/tarantool/blob/master/src/box/lua/net_box.c>`__
68+
where the function ``decode_metadata_optional`` is an example of how Tarantool
69+
itself decodes extra items.
70+
71+
Body
72+
----
73+
74+
The body is an MP_MAP. Maximal iproto package body length is 2 GiB.
75+
76+
The body has the details of the request or response. In a request, it can also
77+
be absent or be an empty map. Both these states will be interpreted equally.
78+
Responses will contain the body anyway even for an
79+
:ref:`IPROTO_PING <box_protocol-ping>` request, where it will be an empty MP_MAP.
80+
81+
A lot of responses contain the IPROTO_DATA map:
82+
83+
.. raw:: html
84+
:file: images/body.svg
85+
86+
For most data-access requests (:ref:`IPROTO_SELECT <box_protocol-select>`,
87+
:ref:`IPROTO_INSERT <box_protocol-insert>`, :ref:`IPROTO_DELETE <box_protocol-delete>`, etc.)
88+
the body is an IPROTO_DATA map with an array of tuples that contain an array of fields.
89+
90+
IPROTO_DATA is what we get with net_box and :ref:`Module buffer <buffer-module>`
91+
so if we were using net_box we could decode with
92+
:ref:`msgpack.decode_unchecked() <msgpack-decode_unchecked_string>`,
93+
or we could convert to a string with :samp:`ffi.string({pointer},{length})`.
94+
The :ref:`pickle.unpack() <pickle-unpack>` function might also be helpful.
95+
96+
.. note::
97+
98+
For SQL-specific requests and responses, the body is a bit different.
99+
:ref:`Learn more <internals-iproto-sql>` about this type of packets.
100+
101+
.. _box_protocol-responses_error:
102+
103+
Error responses
104+
---------------
105+
106+
Instead of :ref:`IPROTO_OK <internals-iproto-ok>`, an error response header
107+
has IPROTO_REQUEST_TYPE = :ref:`IPROTO_TYPE_ERROR <internals-iproto-type_error>`.
108+
Its code is ``0x8XXX``, where ``XXX`` is the error code -- a value in
109+
`src/box/errcode.h <https://github.com/tarantool/tarantool/blob/master/src/box/errcode.h>`_.
110+
``src/box/errcode.h`` also has some convenience macros which define hexadecimal
111+
constants for return codes.
112+
113+
The error response body is a map that contains two keys: :ref:`IPROTO_ERROR <internals-iproto-keys-error>`
114+
and :ref:`IPROTO_ERROR_24 <internals-iproto-keys-error>`.
115+
While IPROTO_ERROR contains an MP_EXT value, IPROTO_ERROR_24 contains a string.
116+
The two keys are provided to accommodate clients with older and newer Tarantool versions.
117+
118+
.. raw:: html
119+
:file: images/error.svg
120+
121+
Error responses before 2.4.1
122+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123+
124+
Before Tarantool v. :doc:`2.4.1 </release/2.4.1>`, the key IPROTO_ERROR contained a string
125+
and was identical to the current IPROTO_ERROR_24 key.
126+
127+
Let's consider an example. This is the fifth message, and the request was to create a duplicate
128+
space with ``conn:eval([[box.schema.space.create('_space');]])``.
129+
The unsuccessful response looks like this:
130+
131+
.. raw:: html
132+
:file: images/error_24.svg
133+
134+
The tutorial :ref:`Understanding the binary protocol <box_protocol-illustration>`
135+
shows actual byte codes of the response to the IPROTO_EVAL message.
136+
137+
Looking in `errcode.h <https://github.com/tarantool/tarantool/blob/master/src/box/errcode.h>`__,
138+
we find that the error code ``0x0a`` (decimal 10) is
139+
ER_SPACE_EXISTS, and the string associated with ER_SPACE_EXISTS is
140+
"Space '%s' already exists".
141+
142+
Since version :doc:`2.4.1 </release/2.4.1>`, responses for errors have extra information
143+
following what was described above. This extra information is given via the
144+
MP_ERROR extension type. See details in the :ref:`MessagePack extensions
145+
<msgpack_ext-error>` section.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
@startuml
2+
3+
skinparam map {
4+
HyperlinkColor #0077FF
5+
FontColor #313131
6+
BorderColor #313131
7+
BackgroundColor transparent
8+
}
9+
10+
json "**IPROTO_AUTH**" as auth_request {
11+
"Size": "MP_UINT",
12+
"Header": {
13+
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_REQUEST_TYPE]]": "IPROTO_AUTH",
14+
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_SYNC]]": "MP_UINT"
15+
},
16+
"Body": {
17+
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_USER_NAME]]": "MP_STR",
18+
"[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/keys IPROTO_TUPLE]]": {
19+
"MP_ARRAY": "[[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/authentication Authentication mechanism]], [[https://tarantool.io/en/doc/latest/dev_guide/internals/iproto/authentication scramble]]"
20+
}
21+
}
22+
}
23+
24+
@enduml

0 commit comments

Comments
 (0)