-
Notifications
You must be signed in to change notification settings - Fork 387
Description
Interactive transactions and streams in iproto/http2
Problem
With the introduction of the new transaction manager for memtx, it becomes possible to yield inside a transaction regardless of the engine used. This allows to implement:
- Interactive transactions. Such a transaction doesn’t need to be sent in one request - begin, commit and tx statements are allowed to be sent and executed in different requests.
- Streams. This concept allows multiplexing several transactions over one connection. It is a more general approach and allows to implement interactive transactions.
Currently tarantool uses iproto as the main communication protocol, but in the future http2 is planned. So the goal is to implement streams for both of the protocols.
Solutions
Rejected in favor of streams
Interactive tx over IPROTO without streams
It was decided that we need a more general approach - streams, plus not only via iproto, but http2 protocol.
Implementation
Introduce begin, commit, rollback commands for a connection object (IPROTO_BEGIN, IPROTO_COMMIT, IPROTO_ROLLBACK accordingly for iproto protocol).
Introduce remote_txn structure: it contains txn and rlist of pending messages (iproto_msg) to be processed. This structure will be allocated only for the new transactions over IPROTO inside the memory pool.
Encapsulate inside iproto_msg new fields: 1) txn - current transaction (or null) 2) next_in_tx - member of remote_txn->pending.
Encapsulate inside the iproto_connection remote_txn object.
Behaviour:
- If call/eval implicitly begins new txn and leaves it opened, txn will be silently rolled back.
- If there is no txn in connection, then conn:commit() / conn:rollback() will be silently ignored.
Streams
Old discussion
Idea of streams in iproto was previously [discussed](https://tkn.me/tarantool/rfc-interactive-transactions-in-iproto/) ([archive](https://tkn.me/tarantool/rfc-interactive-transactions-in-iproto.tar.bz2)).Short intro to an idea from this discussion:
** The problem **
If a CALL request leaves a transaction open upon return, the transaction
will be forcefully aborted. This makes sense for memtx, because it
doesn't support fiber yield (yet), however in case of vinyl, the user
may want to continue execution of the same transaction in the next CALL,
but currently it isn't possible.
Another use case for this is SQL EXECUTE request, which is used for
executing SQL statements. The problem is this request can only be used
for executing a single SQL statement so without transactions in IProto
it is impossible to implement SQL transactions on a remote client (e.g.
via JDBC).
See [1] for more details.
** The solution **
Introduce the concept of streams within an IProto connection, as
suggested by Georgy Kirichenko:
1. Introduce new request header key IPROTO_STREAM_ID.
2. The stream id is generated by the user and passed along with all
requests that are supposed to be executed in the same stream.
3. All requests within the same stream are executed sequentially.
4. Requests from different streams may be executed in parallel.
5. If a transaction is left open by a request in a stream, the next
request will reuse it.
6. If IPROTO_STREAM_ID is unset (0), everything works as before, i.e.
no transaction preservation or request serialization will occur.
The net.box API will look like this:
c = net_box.connect(...)
s = c:make_stream(stream_id)
s:call(...)
A net.box stream instance will be a wrapper around the connection it was
created for. It will have all the same methods as the connection itself,
but all requests sent on its behalf will have the stream id attached.
Questions extracted from a discussion:
- How to map stream ids to stream objects?
Second, we will have to add a hash table mapping stream ids to stream
objects in the tx thread. A stream object would basically be a queue of
request awaiting execution plus an open transaction if any. When a new
request is sent to iproto, it is submitted to the tx thread and then
either executed directly by a fiber of the fiber pool or queued to the
stream object if there's already a request from the same stream being
executed by another fiber. When a fiber finishes executing a request, it
checks if there are more request in the stream queue and continues
execution if so.
- Do we need limit the number of streams?
- Expose tx id to client or not?
- What happens to opened transactions in a stream when a connection is closed? Rollback? What about on_rollback triggers?
- When to destroy a stream? (when last pending request processed / on timeout?)
- What happens when I issue stream in reserved id?
- How to balance stream processing? How to choose from which stream of many ones TX thread should pick up a request?
I think, round-robin is fair enough, but maybe I am wrong.
<...>
I'd rather leave all the balancing to the fiber scheduler. IProto would
simply queue responses to the appropriate stream to be processed by the
fiber that is currently executing a request that belongs to the same
stream, or if there's none, start a new fiber.
- How do streams affect net_msg_max limit?
- SQL support, are changes from one stream affect one another (see doc17)?
- Maybe we need explicit stream open/close api? (see doc22)
Summary from Osipov:
- IPROTO_BEGIN/COMMIT/ROLLBACK only works if IPROTO_STREAM_ID is non-zero
- if stream id is zero, then dangling transactions are rolled back
as they are now
- all requests inside a stream are strictly sequential
- a stream owns its own diagnostics, transaction, transaction
isolation level, and possibly authenticated user (see below).
- better yet, IPROTO_SQL_EXECUTE is only available if stream id is
non-zero
The stream is a wrapper around connection object:
local stream = conn:stream([optional id])
stream:call('box.begin')
stream:call('box.insert', {...})
stream:call('box.commit')
- There are no special commands to create and close a stream. There's no server side method to generate new STREAM_ID. Any connector is capable to generate stream IDs for given connection by any convenient strategy, for example incremental sequence per connection.
- A stream is a part of connection. A request with the same STREAM_ID but in different connection belongs to some other stream.
- Requests with the same non-zero STREAM_ID are processed synchronously (TX thread must process the next request of a stream strictly after the previous request of the same stream is completed).
- Each stream can start its own transaction.
- Once a transaction in a stream is started it's guaranteed that all requests of the stream are processed by one exclusive fiber until the transaction is ended.
- Requests of the same stream beyond a transaction may or may not be processed by exclusive fiber.
Ids
Each stream is associated with its id. Id is generated on the client side, but is hidden from the actual user. Instead, user operates on a stream object and internally it is mapped to the corresponding id.
We introduce STREAM_ID of request in IPROTO protocol. Omitted or STREAM_ID = 0 means legacy behavior.
Serialization
Requests in a stream are processed synchronously. The reason is that transactions can now be yielding, so we must wait for the previous request before sending the next one in order to guarantee serialization.
Fibers created from a transaction
Problem
Imagine if a transaction in a stream creates new fiber (e.g. it can be done implicitly via some library call). Should this fiber see uncommitted data from this transaction? What if fiber also writes some data, should it get into transaction also?
Solution
One way to the problem is to forbid fibers that write / read data, but this is too restrictive. Let's say that such fibers are fully independent of this transaction: they can't see any data from this tx, and can create their own tx if they want.
Where stream is executed
Each stream is executed in its own fiber. Tx thread should somehow decide what to process next - maybe we need some sophisticated prioritization?
Stream closing
We can return corresponding fiber back to the pool, when:
- Commit / rollback happens.
- No transaction is executed in this stream and there are no pending requests left.
If connection is closed, all streams are closed and all non-committed transactions from streams are rollbacked.
Limits
Stream limit
Amount of streams is limited by the existing option - net_msg_max.
Pending messages limit
Imagine there is an infinite yielding loop inside a transaction from a stream. So there are infinitelly many messages from this stream. Again, as I understand it is already limited by the net_msg_max.
HTTP/2.0
The concept of streams is already encapsulated in the protocol. The main idea is to multiplex several streams through one TCP connection.
Stream can be started by either client or server.
Stream id must be positive integer, odd for clients, even for the server. Stream id zero is reserved.