-
Notifications
You must be signed in to change notification settings - Fork 171
Pass csv parameters during duckdb connection #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
{ "pattern": "^.$" } | ||
], | ||
"description": "Only for format = json. How multiple json documents are delimited within one file" | ||
"description": "For JSON format, only 'new_line' or 'array' is allowed to indicate how multiple JSON documents are delimited. For CSV format, any single character can be used as the delimiter between columns. Only valid for CSV." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the master for the schema is here: https://github.com/datacontract/datacontract-specification/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jochenchrist should I not modify this one and just change the one in the other repo then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't the file here, but propose a change in the other repo.
We still can continue to use the provided keys in the data_contract_specification (as custom extensions)
Allows duckdb to load the csv file correctly and lets SodaCL check for field presence. This fix does not check for incorrect ordering of columns.
Removed logging warning.
fix: Typo in datacontract.schema
"rich>=13.7,<13.10", | ||
"sqlglot>=26.6.0,<27.0.0", | ||
"duckdb==1.1.2", | ||
"fsspec", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In file datacontract\engines\soda\connections\duckdb.py
, the method sniff_csv_header
uses duckdb.from_csv_auto
to read a csv file as a stream.
Without fsspec
, the following fails:
return duckdb.from_csv_auto(io.BytesIO(header_line), **csv_params).columns
E duckdb.duckdb.InvalidInputException: Invalid Input Error: This operation could not be completed because required module 'fsspec' is not installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename is not in line with the naming conventions (test__.py
Please move these tests to test_test_clocal_csv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the tests do not test the test routine. They test whether duckdb can handle mismatched column specifications when reading csv-files.
Do you still want them in test_test_local_csv.py
?
Hi there is also a regression, introduce by the sniff_csv_header, when using a |
Uh oh!
There was an error while loading. Please reload this page.