You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think we can write a potentially large request chunk-by-chunk to a socket. To achieve that we need to learn our msgpack encoder to work in a lazy manner: receive a stream as an input parameter and return a stream as a result.
My main concern is that we can avoid materializing streams in memory.
JDBC streams support was provided for PreparedStaement in #208. Due to the lack of streams support in IPPROTO as well as MsgPackLite implementation, the driver materializes a stream in memory before it will be encoded and sent to the socket.
Rationale: it's possible to send/receive quite long binary data. (for instance, MP_BIN32 stores a byte array whose length is up to (2^32)-1 bytes == 4Gb). As a result, it will be convenient to push/fetch bytes partially chunk-by-chunk for such enormous data. The InputStream, especially BufferredStream, looks a suitable abstraction over the bytes "on-demand".
There are a few open questions:
Do we support both byte arrays and streams to represent binary data?
Maybe it's worth providing such support only for mp_bin16/32 and for mp_bin8 use byte[].
The current implementation requires a packet must be fully encoded to be sent. And vice versa response, before it will be returned, should be completely read and decoded. How to support laziness here is a big question.
It seems the last point is the most important to be clarified.
The text was updated successfully, but these errors were encountered:
Do we support both byte arrays and streams to represent binary data?
Maybe it's worth providing such support only for mp_bin16/32 and for mp_bin8 use byte[].
Is this the questions about msgpack encoder / decoder API, a client API or JDBC client API? Can you expand it a bit, please?
The current implementation requires a packet must be fully encoded to be sent. And vice versa response, before it will be returned, should be completely read and decoded. How to support laziness here is a big question.
If this is the question how to support it with tarantool binary protocol, then let me cite my comment to PR #208:
I would not say that the protocol does not support streaming. At least if you know a size of your data you can calculate a packet size before encode it, send a packet header and then do a bunch of writes to a socket. When a size of data is known it is quite easy to calculate msgpack size. However you are right in the case when a size of data is not known: a client should materialize it or calculate a size of data somehow before start sending.
If the question is about the connector implementation, then I guess it requires to implement a size calculation w/o actual encoding and support of encoding by chunks in our msgpack encoder. The same for decoder.
When I did comment PR #208 I didn't consider that it is possible that a data size may be unknown according to JDBC API. We however need a data size to create a packet header. JDBC API allows to send streams w/o known size, they will need to be materized anyway. So now I think that support of lazy msgpack encoding has much less value. More on that: tarantool is not intended to be used as a storage for large data records (at least now).
So I think it not worth to implement this issue.
@nicktorwald Are there question need to be clarified (maybe the first one) or we can close this?
Is this the questions about msgpack encoder / decoder API, a client API or JDBC client API? Can you expand it a bit, please?
That question isn't so important and it was about MsgPack API, should it support both byte[] for mp_bin8 and InputStream for bigger data size (mp_bin16/32).
Uh oh!
There was an error while loading. Please reload this page.
In scope of #208, @Totktonada stated
JDBC streams support was provided for PreparedStaement in #208. Due to the lack of streams support in IPPROTO as well as MsgPackLite implementation, the driver materializes a stream in memory before it will be encoded and sent to the socket.
Rationale: it's possible to send/receive quite long binary data. (for instance,
MP_BIN32
stores a byte array whose length is up to (2^32)-1 bytes == 4Gb). As a result, it will be convenient to push/fetch bytes partially chunk-by-chunk for such enormous data. TheInputStream
, especiallyBufferredStream
, looks a suitable abstraction over the bytes "on-demand".There are a few open questions:
Do we support both byte arrays and streams to represent binary data?
Maybe it's worth providing such support only for mp_bin16/32 and for mp_bin8 use byte[].
The current implementation requires a packet must be fully encoded to be sent. And vice versa response, before it will be returned, should be completely read and decoded. How to support laziness here is a big question.
It seems the last point is the most important to be clarified.
The text was updated successfully, but these errors were encountered: