Skip to content

Minor inaccurracy in documentation of read_csv's option mangle_dupe_cols #19203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Bernhard10 opened this issue Jan 12, 2018 · 4 comments · Fixed by #19208
Closed

Minor inaccurracy in documentation of read_csv's option mangle_dupe_cols #19203

Bernhard10 opened this issue Jan 12, 2018 · 4 comments · Fixed by #19208
Labels
Milestone

Comments

@Bernhard10
Copy link
Contributor

Bernhard10 commented Jan 12, 2018

Code example:

File test.csv:

,a,a,b
0,1,2,3
1,4,5,6

Python code:

import pandas as pd
df = pd.read_csv("test.csv")
df.columns.values

Gives ['a', 'a.1', 'b' ] and not, as documented ['a.0', 'a.1', 'b']

Problem description

The documentation states that names will be specified as ‘X.0’...’X.N’, but in fact the names become 'X','X.1',...'X.N'

So in contrast to what the documentation says, the duplicate column-name is not changed at the first occurrence and only subsequent occurrences get a number appended.

Expected Output

Either change the code to mangle the first duplicate column name, or simply fix the documentation.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.11-200.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 36.6.0
Cython: None
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

That observation seems correct. Do you want to do a PR to update the docs?

@jreback jreback added this to the Next Major Release milestone Jan 12, 2018
@Bernhard10
Copy link
Contributor Author

Sure, I'll submit a PR today.

@bhavybarca
Copy link

I would like to contribute to this issue

@jorisvandenbossche
Copy link
Member

@bhavybarca there is already an open PR for this

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants