Skip to content

Document graceful shutdown of net.box connections #3100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions doc/dev_guide/internals/box_protocol.rst
Original file line number Diff line number Diff line change
Expand Up @@ -967,6 +967,8 @@ See the :ref:`Watchers <box-protocol-watchers>` section below.
Watchers
--------

Since :doc:`2.10.0 </release/2.10.0>`.

The commands below support asynchronous server-client notifications signalled
with :ref:`box.broadcast() <box-broadcast>`.
Servers that support the new feature set the ``IPROTO_FEATURE_WATCHERS`` feature in reply to the ``IPROTO_ID`` command.
Expand Down Expand Up @@ -1073,6 +1075,48 @@ The body is a 2-item map:
``IPROTO_EVENT_DATA`` (code 0x57) contains data sent to a remote watcher.
The parameter is optional, the default value is ``nil``.

.. _box-protocol-shutdown:

Graceful shutdown protocol
--------------------------

Since :doc:`2.10.0 </release/2.10.0>`.

The graceful shutdown protocol is a mechanism that helps to prevent data loss in requests in case of a shutdown command.
According to the protocol, when a server receives an ``os.exit()`` command or a ``SIGTERM`` signal,
it does not exit immediately.
Instead of that, first, the server stops listening for new connections.
Then, the server sends the shutdown packets to all connections that support the graceful shutdown protocol.
When a client is notified about the upcoming server exit, it stops serving any new requests and
waits for active requests to complete before closing the connections.
Once all connections are terminated, the server will be shut down.

The protocol uses the event subscription system.
That is, the feature is available if the server supports the :ref:`box.shutdown <system-events_box-shutdown>` event
and ``IPROTO_WATCH``.
For more information about it, see :ref:`reference for the event watchers <box-watchers>`
and the :ref:`corresponding section <box-protocol-watchers>` of this document.

The shutdown protocol works in the following way:

#. First, the server receives a shutdown request.
It can be either an ``os.exit()`` command or a :ref:`SIGTERM <admin-server_signals>` signal.

#. Then the :ref:`box.shutdown <system-events_box-shutdown>` event is generated.
The server broadcasts it to all subscribed remote watchers (see :ref:`IPROTO_WATCH <box_protocol-watch>`).
That is, the server calls :ref:`box.broadcast('box.shutdown', true) <box-broadcast>`
from the :ref:`box.ctl.on_shutdown() <box_ctl-on_shutdown>` trigger callback.
Once this is done, the server stops listening for new connections.

#. From now on, the server waits until all subscribed connections are terminated.

#. At the same time, the client gets the ``box.shutdown`` event and shuts the connection down gracefully.

#. After all connections are closed, the server will be stopped.
Otherwise, a timeout occurs, and the Tarantool exits immediately.
You can set up the required timeout with the
:ref:`set_on_shutdown_timeout() <box_ctl-on_shutdown_timeout>` function.

.. _box_protocol-responses:

Responses if no error and no SQL
Expand Down
4 changes: 4 additions & 0 deletions doc/reference/reference_lua/box_ctl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ Below is a list of all ``box.ctl`` functions.
* - :doc:`./box_ctl/on_shutdown`
- Create a "shutdown trigger"

* - :doc:`./box_ctl/set_on_shutdown_timeout`
- Set a timeout in seconds for the ``on_shutdown`` trigger

* - :doc:`./box_ctl/is_recovery_finished`
- Check if recovery has finished

Expand All @@ -57,5 +60,6 @@ Below is a list of all ``box.ctl`` functions.
box_ctl/wait_rw
box_ctl/on_schema_init
box_ctl/on_shutdown
box_ctl/set_on_shutdown_timeout
box_ctl/is_recovery_finished
box_ctl/promote
10 changes: 6 additions & 4 deletions doc/reference/reference_lua/box_ctl/on_shutdown.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
.. _box_ctl-on_shutdown:
.. _box_ctl-on_shutdown:

===============================================================================
box.ctl.on_shutdown()
===============================================================================

.. module:: box.ctl
.. module:: box.ctl

The ``box.ctl`` submodule also contains two functions for the two
:ref:`server trigger <triggers>` definitions: ``on_shutdown`` and ``on_schema_init``.
Please, familiarize yourself with the mechanism of trigger functions before using them.
Details about trigger characteristics are in the :ref:`triggers <triggers-box_triggers>` section.

.. function:: on_shutdown(trigger-function [, old-trigger-function])
.. function:: on_shutdown(trigger-function [, old-trigger-function])

Create a "shutdown :ref:`trigger <triggers>`".
The ``trigger-function`` will be executed
Expand All @@ -29,5 +30,6 @@ Please, familiarize yourself with the mechanism of trigger functions before usin
If the parameters are (nil, old-trigger-function), then the old
trigger is deleted.

Details about trigger characteristics are in the :ref:`triggers <triggers-box_triggers>` section.
If you want to set a timeout for this trigger,
use the :ref:`set_on_shutdown_timeout <box_ctl-on_shutdown_timeout>` function.

18 changes: 18 additions & 0 deletions doc/reference/reference_lua/box_ctl/set_on_shutdown_timeout.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _box_ctl-on_shutdown_timeout:

===============================================================================
box.ctl.set_on_shutdown_timeout()
===============================================================================

.. module:: box.ctl

.. function:: set_on_shutdown_timeout([timeout])

Set a timeout for the :ref:`on_shutdown <box_ctl-on_shutdown>` trigger.
If the timeout has expired, the server stops immediately
regardless of whether any ``on_shutdown`` triggers are left unexecuted.

:param double timeout: time to wait for the trigger to be completed. The default value is 3 seconds.

:return: nil

2 changes: 2 additions & 0 deletions doc/reference/reference_lua/box_events.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Event watchers
==============

Since :doc:`2.10.0 </release/2.10.0>`.

The ``box`` module contains some features related to event subscriptions, also known as :term:`watchers <watcher>`.
The subscriptions are used to inform the client about server-side :term:`events <event>`.
Each event subscription is defined by a certain key.
Expand Down
27 changes: 23 additions & 4 deletions doc/reference/reference_lua/box_events/system_events.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
System events
=============

Since :doc:`2.10.0 </release/2.10.0>`.

Predefined events have a special naming schema -- theirs names always start with the reserved ``box.`` prefix.
It means that you cannot create new events with it.

Expand All @@ -12,6 +14,7 @@ The system processes the following events:
* ``box.status``
* ``box.election``
* ``box.schema``
* ``box.shutdown``

In response to each event, the server sends back certain ``IPROTO`` fields.

Expand All @@ -26,7 +29,7 @@ This triggers the ``box.info`` event, which states that the value of ``box.info.
while ``box.info.uuid`` and ``box.info.cluster.uuid`` remain the same.

box.id
~~~~~~
------

Contains :ref:`identification <box_info_info>` of the instance.
Value changes are rare.
Expand All @@ -50,7 +53,7 @@ Value changes are rare.
}

box.status
~~~~~~~~~~
----------

Contains generic information about the instance status.

Expand All @@ -67,7 +70,7 @@ Contains generic information about the instance status.
}

box.election
~~~~~~~~~~~~
------------

Contains fields of :doc:`box.info.election </reference/reference_lua/box_info/election>`
that are necessary to find out the most recent writable leader.
Expand All @@ -87,7 +90,7 @@ that are necessary to find out the most recent writable leader.
}

box.schema
~~~~~~~~~~
----------

Contains schema-related data.

Expand All @@ -99,6 +102,22 @@ Contains schema-related data.
MP_STR “version”: MP_UINT schema_version,
}

.. _system-events_box-shutdown:

box.shutdown
------------

Contains a boolean value which indicates whether there is an active shutdown request.

The event is generated when the server receives a shutdown request (``os.exit()`` command or
:ref:`SIGTERM <admin-server_signals>` signal).

The ``box.shutdown`` event is applied for the graceful shutdown protocol.
It is a feature which is available since :doc:`2.10.0 </release/2.10.0>`.
This protocol is supposed to be used with connectors to signal a client about the upcoming server shutdown and
close active connections without broken requests.
For more information, refer to the :ref:`graceful shutdown protocol <box-protocol-shutdown>` section.

Usage example
-------------

Expand Down
66 changes: 53 additions & 13 deletions doc/reference/reference_lua/net_box.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,29 +56,37 @@ Most ``net.box`` methods accept the last ``{options}`` argument, which can be:
The default value is ``false``.
For an example, see option description :ref:`below <net_box-return_raw>`.

.. _net_box-state_diagram:

The diagram below shows possible connection states and transitions:

.. ifconfig:: builder not in ('latex', )

.. image:: net_states.svg
.. image:: net_states.png
:align: center
:alt: net_states.svg
:alt: net_states.png

On this diagram:

* ``net_box.connect()`` method spawns a worker fiber, which will establish the connection and start the state machine.
* ``net_box.connect()`` method spawns a worker fiber, which will establish the connection and start the state machine.

* The state machine goes to the ``initial`` state.

* The state machine goes to the ‘initial‘ state.
* Authentication and schema upload.
It is possible later on to re-enter the ``fetch_schema`` state from ``active`` to trigger schema reload.

* Authentication and schema upload.
It is possible later on to re-enter the ‘fetch_schema’ state from ‘active’ to trigger schema reload.
* The state changes to the ``graceful_shutdown`` state when the state machine
receives a :ref:`box.shutdown <system-events_box-shutdown>` event from the remote host
(see :ref:`conn:on_shutdown() <net_box-on_shutdown>`).
Once all pending requests are completed, the state machine switches to the ``error`` (``error_reconnect``) state.

* The transport goes to the ‘error’ state in case of an error.
It can happen, for example, if the server closed the connection.
If the ``reconnect_after`` option is set, instead of the ‘error’ state, the transport goes to the ‘error_reconnect’ state.
* The transport goes to the ``error`` state in case of an error.
It can happen, for example, if the server closed the connection.
If the ``reconnect_after`` option is set, instead of the ‘error’ state,
the transport goes to the ``error_reconnect`` state.

* ``conn.close()`` method sets the state to closed and kills the worker.
If the transport is already in the error state, ``close()`` does nothing.
* ``conn.close()`` method sets the state to ``closed`` and kills the worker.
If the transport is already in the ``error`` state, ``close()`` does nothing.

===============================================================================
Index
Expand Down Expand Up @@ -131,7 +139,9 @@ Below is a list of all ``net.box`` functions.
* - :ref:`conn:on_connect() <net_box-on_connect>`
- Define a connect trigger
* - :ref:`conn:on_disconnect() <net_box-on_disconnect>`
- Define a disconnect trigger
- Define a disconnect trigger
* - :ref:`conn:on_shutdown() <net_box-on_shutdown>`
- Define a shutdown trigger
* - :ref:`conn:on_schema_reload() <net_box-on_schema_reload>`
- Define a trigger when schema is modified
* - :ref:`conn:new_stream() <conn-new_stream>`
Expand Down Expand Up @@ -820,9 +830,39 @@ With the ``net.box`` module, you can use the following
be replaced by trigger-function
:return: nil or function pointer

.. _net_box-on_shutdown:

.. function:: conn:on_shutdown([trigger-function[, old-trigger-function]])

Define a trigger for shutdown when a :ref:`box.shutdown <system-events_box-shutdown>` event is received.

The trigger starts in a new fiber.
While the ``on_shutdown()`` trigger is running, the connection stays active.
It means that the trigger callback is allowed to send new requests.

After the trigger return, the ``net.box`` connection goes to the ``graceful_shutdown`` state
(check :ref:`the state diagram <net_box-state_diagram>` for details).
In this state, no new requests are allowed.
The connection waits for all pending requests to be completed.

Once all in-progress requests have been processed, the connection is closed.
The state changes to ``error`` or ``error_reconnect``
(if the ``reconnect_after`` option is defined).

Servers that do not support the ``box.shutdown`` event or :ref:`IPROTO_WATCH <box_protocol-watch>`
just close the connection abruptly.
In this case, the ``on_shutdown()`` trigger is not executed.

:param function trigger-function: function which will become the trigger
function. Takes the ``conn``
object as the first argument
:param function old-trigger-function: existing trigger function which will
be replaced by trigger-function
:return: nil or function pointer

.. _net_box-on_schema_reload:

.. function:: conn:on_schema_reload([trigger-function[, old-trigger-function]])
.. function:: conn:on_schema_reload([trigger-function[, old-trigger-function]])

Define a trigger executed when some operation has been performed on the remote
server after schema has been updated. So, if a server request fails due to a
Expand Down
Binary file added doc/reference/reference_lua/net_states.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading