-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix COPY TO does not produce an output file for the empty set #18074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix COPY TO does not produce an output file for the empty set #18074
Conversation
}; | ||
|
||
// Single-file output requires creating at least one file stream in advance. | ||
// If no record batches are present in the input stream (zero-row scenario), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might a bit missing from this comment. Sentence doesn't seem complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased it. Last sentence was indeed not complete.
09173f8
to
373f969
Compare
Quick question about multi-file output behavior (non-partitioned case): Do we want to guarantee at least one file on disk here too? If so, we'd just need to remove the |
…ed test to validate schema is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bert-beyondloops 🙏
The behavior of writing empty files has come up several times before and I worry we will go back and forth on the implementation if we don't have the expected results written down somewhere
Could you also please add some documentation to the top of this module / the demuxer tasks to make it clearer what behavior is expected?
Which issue does this PR close?
COPY TO does not produce a single output file for an empty set
Rationale for this change
Executing following sql does not effectively create a single output file on disk :
COPY (SELECT 1 AS id WHERE FALSE) TO 'table_no_rows.parquet';
I would expect it creates a parquet file containing 0 rows including the schema metadata.
The fact you can still query the schema of such a table is still valuable information.
What changes are included in this PR?
Are these changes tested?
Additional COPY TO test added in the copy.slt sqllogictests
Are there any user-facing changes?
A file containing 0 rows will be created now