Skip to content

Bug with read_table, skiprows, and C engine #8679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jiffyclub opened this issue Oct 30, 2014 · 2 comments · Fixed by #8752
Closed

Bug with read_table, skiprows, and C engine #8679

jiffyclub opened this issue Oct 30, 2014 · 2 comments · Fixed by #8752
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@jiffyclub
Copy link

I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt. The data start on line 73.

If I use the default C engine with read_table I have to specify skiprows=85 to properly load the table:

pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

But if I use the Python engine then the expected skiprows=72 works:

pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])

The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use skiprows=72 with the C engine.

I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2014

so if you specify comment='#' the c-parser gives the same result. Must be an interactions on the skiprows/skip_blank_lines (e.g. somehow things its a 'real' line).

prob related #8661

cc @mdmueller

@jreback jreback added Bug IO CSV read_csv, to_csv labels Oct 30, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 30, 2014
@selasley
Copy link
Contributor

selasley commented Nov 5, 2014

This seems to be related to the problem with skiprows, header=None and lines ending in a trailing space. I downloaded the file, removed all trailing spaces and was able to read the data correctly with skiprows=72 and engine='c'. Mac OS X Yosemite, python 3.4.2 from python.org and pandas version '0.15.0-85-gaf2bfb7'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants