Skip to content

[Python][C++] Enable CSV reader to read from concatenated gzip stream #22382

@asfimport

Description

@asfimport

If two gzipped files are concatenated together, the result is a valid gzip file.  However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file.

If the repro script here is run, the output is:

$ python repro.py
pyarrow.csv only reads one row:
x
0 1
pandas reads two rows:
x
0 1
1 2
pyarrow version: 0.14.0

Reporter: Jordan Samuels
Assignee: Antoine Pitrou / @pitrou

PRs and other links:

Note: This issue was originally created as ARROW-5974. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions