-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
When working with ArrowWriter
I would like to flush buffered rows onto the disk. However, when calling ArrowWriter<W>::flush()
only part of the data is flushed. The reason is that parquet::file::writer::TrackedWrite
that is used by ArrowWriter
inserts BufWriter
on top of user supplied writer W
. This BufWriter
is not flushed() when ArrowWriter<W>::flush()
is called.
The best solution to this problem would be to remove BufWriter
from TrackedWrite
and just use the user supplied Writer
. The BufWriter
suppose to buffer small writes, but this function is not needed when writing to memory and most operating systems employ this sort of mechanism. Thus, it is redundant. Maybe, BufWriter
could be beneficial when working with bare-metal system, but then a user could just wrap its writer in BufWriter
and give it to ArrowWriter
. Nonetheless, I guess that DataFusion is not ofter run on bare-metal.