-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used #17810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe this is expected behavior. From read_csv
Maybe the converter arg to read_csv is what you're after |
@MartinThoma If you look at the values of the column, you will see pandas correctly preserved the data as strings (as you specified with
The only 'gotcha' is that empty strings are still seen as missing values (and thus converted to NaN), and not kept as an empty string. So your solution of filling the missing values with empty string ( |
The solution of using the converters arg ( |
I seem to recall this issue coming up before. would be helpful to link to prior discussions. |
I don't directly find another related issue, apart from #1450, which you can actually do as well: add |
Why is |
Is it only me, or is the type inference and missing data handling part of reading input data an idiosyncratic part of pandas dataframes? Anyway thanks for all the advice. |
Seems like this is the intended behavior which is documented in |
Take a look at this: pd.read_csv('csv_file.csv', dtype={'special_id': int}) That code throwing this error: It is because that given column have empty cells that I expected to be consider as NaN. Without the |
Code Sample, a copy-pastable example if possible
test.csv:
Problem description
When I use
dtype={'FOO': str}
, I expect pandas to treat the column as a string. This seems to work, but when an empty cell is present Pandas seems to switch to float.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 3.2.2
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: 1.1.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: