Skip to content

BUG: read_csv names argument inconsisten between c and python engine #38453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
phofl opened this issue Dec 13, 2020 · 0 comments · Fixed by #44654
Closed
3 tasks done

BUG: read_csv names argument inconsisten between c and python engine #38453

phofl opened this issue Dec 13, 2020 · 0 comments · Fixed by #44654
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@phofl
Copy link
Member

phofl commented Dec 13, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

After #38445 there is another inconsistency left to adress.

s = """a, b, c, d
1,2,3,4,
5,6,7,8,"""
pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="c")

pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python")

Problem description

The bug is caused from the differing lenghts of the header and the names argument.

This returns

   A  B  C  D   E
0  1  2  3  4 NaN
1  5  6  7  8 NaN

for the c engine and raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.3/scratches/scratch_4.py", line 323, in <module>
    print(pd.read_csv(io.StringIO(s), header=0, names=['A', 'B', 'C', 'D', "E"], engine="python"))
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2303, in __init__
    ) = self._infer_columns()
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers.py", line 2692, in _infer_columns
    raise ValueError(
ValueError: Number of passed names did not match number of header fields in the file

Process finished with exit code 1

Expected Output

Would expect that both return the same and python engine does not raise.

Output of pd.show_versions()

master

@phofl phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 13, 2020
@jreback jreback added this to the 1.4 milestone Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
2 participants