Pass csv parameters during duckdb connection #648

Halpph · 2025-02-19T16:18:53Z

Tests pass
ruff format
README.md updated (if relevant)
CHANGELOG.md entry added

datacontract/model/data_contract_specification.py

jochenchrist · 2025-02-21T15:03:23Z

datacontract/schemas/datacontract-1.1.0.schema.json

+            { "pattern": "^.$" }
          ],
-          "description": "Only for format = json. How multiple json documents are delimited within one file"
+          "description": "For JSON format, only 'new_line' or 'array' is allowed to indicate how multiple JSON documents are delimited. For CSV format, any single character can be used as the delimiter between columns. Only valid for CSV."


the master for the schema is here: https://github.com/datacontract/datacontract-specification/

@jochenchrist should I not modify this one and just change the one in the other repo then?

Please don't the file here, but propose a change in the other repo.

We still can continue to use the provided keys in the data_contract_specification (as custom extensions)

Allows duckdb to load the csv file correctly and lets SodaCL check for field presence. This fix does not check for incorrect ordering of columns.

Removed logging warning.

fix: Typo in datacontract.schema

jochenchrist · 2025-03-01T11:08:24Z

pyproject.toml

  "rich>=13.7,<13.10",
  "sqlglot>=26.6.0,<27.0.0",
  "duckdb==1.1.2",
+  "fsspec",


Where is this used?

In file datacontract\engines\soda\connections\duckdb.py, the method sniff_csv_header uses duckdb.from_csv_auto to read a csv file as a stream.

Without fsspec, the following fails:

return duckdb.from_csv_auto(io.BytesIO(header_line), **csv_params).columns E duckdb.duckdb.InvalidInputException: Invalid Input Error: This operation could not be completed because required module 'fsspec' is not installed

jochenchrist · 2025-03-01T11:12:30Z

tests/test_duckdb_csv.py

The filename is not in line with the naming conventions (test__.py
Please move these tests to test_test_clocal_csv.

But the tests do not test the test routine. They test whether duckdb can handle mismatched column specifications when reading csv-files.

Do you still want them in test_test_local_csv.py?

dmaresma · 2025-03-28T22:13:43Z

Hi there is also a regression, introduce by the sniff_csv_header,

when using a with open(model_path, 'rb') the model_path could be an abfss:// , s3:// blob or datalake store url, that is not supported by the python io open() function.

Halpph mentioned this pull request Feb 19, 2025

CSV validation behaviour is suprising and buggy #401

Open

Pass csv parameters during duckdb connection

e45de70

Halpph force-pushed the main branch from 2e99df5 to e45de70 Compare February 20, 2025 11:28

Merge branch 'main' into main

4a93935

jochenchrist reviewed Feb 21, 2025

View reviewed changes

datacontract/model/data_contract_specification.py Outdated Show resolved Hide resolved

jochenchrist reviewed Feb 21, 2025

View reviewed changes

stefanedwards and others added 5 commits February 26, 2025 15:33

fix: Typo in datacontract.schema

9a9fcf5

fix: mismatch between csv file and model

5c848be

Allows duckdb to load the csv file correctly and lets SodaCL check for field presence. This fix does not check for incorrect ordering of columns.

fix: added checks to duckdb-csv loading

46e874c

Removed logging warning.

Merge pull request #1 from stefanedwards/pr648

d4046e1

fix: Typo in datacontract.schema

refactor server vars to use lowerCamelCase

a0b2362

jochenchrist reviewed Mar 1, 2025

View reviewed changes

jochenchrist mentioned this pull request May 15, 2025

On how to contribute to your project. #755

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass csv parameters during duckdb connection #648

Pass csv parameters during duckdb connection #648

Uh oh!

Halpph commented Feb 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

jochenchrist Feb 21, 2025

Uh oh!

Halpph Feb 28, 2025

Uh oh!

jochenchrist Mar 1, 2025

Uh oh!

jochenchrist Mar 1, 2025

Uh oh!

stefanedwards Mar 7, 2025

Uh oh!

jochenchrist Mar 1, 2025

Uh oh!

stefanedwards Mar 7, 2025

Uh oh!

dmaresma commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Pass csv parameters during duckdb connection #648

Are you sure you want to change the base?

Pass csv parameters during duckdb connection #648

Uh oh!

Conversation

Halpph commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jochenchrist Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Halpph Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jochenchrist Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

jochenchrist Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

stefanedwards Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

jochenchrist Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

stefanedwards Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

dmaresma commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Halpph commented Feb 19, 2025 •

edited

Loading