-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
read_csv issues with dict for na_values #19227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. |
@jorisvandenbossche : So many bugs...that's all I can say. 😂 Luckily, the patch for them isn't so bad 😄 (PR coming for them soon). |
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes pandas-devgh-19227.
Patches very buggy behavior of keep_default_na=False whenever na_values is a dict * Respect keep_default_na for column that doesn't exist in na_values dictionary * Don't crash / break when na_value is a scalar in the na_values dictionary. In addition, clarifies documentation on behavior of keep_default_na with respect to na_filter and na_values. Closes gh-19227.
Wow, talk about being late to the party! Sorry folks, was tied up for last two days and am only now getting to pay attention to my backlog. But it looks as though it's all sewn up - well done and thanks! :-) |
Basically, I can't get a dictionary of
na_values
to work properly for me, no matter what I try. Pandas version is 0.22.0.hack.csv contains:
Here are two variants of my code - the one with the list does what I expect, but the dict version doesn't:
output from list version
looks correct, but the dict version
although clearly paying attention to the columns I specify, is simply refusing to create any NaNs in those columns:
So... I'm stuck. Any suggestions? I really want to have column-specific NaN handling so I need the dict.
Additionally the dict version does create NaNs in columns I didn't specify in the dict, which also totally goes against my expectations for the combination of
keep_default_na=False
and an explicit value forna_values
. Maybe I'm misreading the docs on that point.Finally, you may notice that I used "214.008" (and other quoted numeric values) in the dict above. This is because I get a "not iterable" error when I provide unquoted numbers. This is despite that having been flagged as an issue and fixed a while back. This feels like another buglet to me.
Btw: to be picky, another doc-related quibble: I think the docs for
keep_default_na
are a bit misleading, in that they imply thatkeep_default_na=True
should have no effect unlessna_values
is supplied (but in fact there is an effect even whenna_values
isn't supplied). It might be over-pedantic of me to care, but I feel that primary documentation really ought to be unambiguous. If anybody agrees with this pedantry I would be happy to propose a tweak ;-)This issue was raised after I recently commented on two other issues with the problems described above, and @gfyoung suggested I ought to raise a new issue (comment links below).
#1657 (comment)
#12224 (comment)
Output of
pd.show_versions()
NB: I quite likely have some "old" modules in the pile below, but I believe that I've updated pandas itself correctly, so any dependencies ought to have been updated too. If my errors aren't reproducible by others, it might indicate that there's a hidden version dependency(?) but I'm too much of a pandas noob to know how likely that is.
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 2.9.2
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: