Bug with read_table, skiprows, and C engine

I'm reading the file available at ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt. The data start on line 73.

If I use the default C engine with `read_table` I have to specify `skiprows=85` to properly load the table:

``` python
pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=85, engine='c',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
```

But if I use the Python engine then the expected `skiprows=72` works:

``` python
pd.read_table(
        'co2_mm_mlo.txt.', sep=r'\s+', header=None, skiprows=72, engine='python',
        names=['year', 'month', 'dec_year', 'average', 'interpolated', 'trend', 'days'])
```

The resulting DataFrame is expected to have 679 rows, but has 691 rows and data from the header if I use `skiprows=72` with the C engine.

I've confirmed this behavior on Mac OS X Yosemite with Pandas 0.15.0 and a checkout of master@5cf3d85a7d4c448519fa08f918a114209cfbdf2b.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bug with read_table, skiprows, and C engine #8679

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bug with read_table, skiprows, and C engine #8679

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions