-
Notifications
You must be signed in to change notification settings - Fork 264
Description
Describe the bug
This was found during debugging CI failures of #1050.
One example of failed test is date_add with int scalars in CometExpressionSuite. The query is "SELECT _20 + CAST(2 as $intType) from tbl which has simply a CometScan + CometProject + Spark ColumnarToRowExec.
CometProject (i.e., DataFusion ProjectExec) doesn't store arrays internally. The only possibility that fails the safety check is that the arrays are not released before we fill next values into the CometBuffers.
In Spark ColumnarToRowExec, once it pulls out all rows from current ColumnarBatch, it simply assigns it to null to release the JVM object, but close is never called on the batch object to release vector resources (e.g., for Comet, Arrow arrays). It is more complicated than just add a close call there because Spark uses WritableColumnVector there for some components (e.g., Parquet reader). Once close is called on a WritableColumnVector, it will make the vector "not" writable anymore.
To completely fix it, we need some changes in Spark. I did a quick experiment locally in Spark and verified that if a close is properly called on non WritableColumnVector there, failed tests can pass without failing the safety check.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response