-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix csv.QUOTE_NONNUMERIC quoting in to_csv #13418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ed75bfb
to
f8c657c
Compare
@@ -2138,6 +2016,31 @@ def formatter(value): | |||
|
|||
return formatter | |||
|
|||
def _get_formatted_data(self): | |||
""" | |||
Returns formatted float values. This is different from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you making this a separate function ? the point was to eliminate this kind of thing;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my reasoning was made clear in the doc-string. get_result_as_array
returns only strings (it supposed to be doing this for quite some time). However, formatting CSV output is not always string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point you refer to was well-intended but ultimately incorrect, and sure, we could add this to get_result_as_array
, but that's an API change then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how so? it's an internal routine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_result_as_array
- I wouldn't know that from the signature (no preceding _
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these entire module is internal - in reality none should be leading _
we have very inconsistent usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Fair enough, then a note should be made about that in the module doc-string. I'll move this functionality into get_result_as_array
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added note to format.py
- unfortunately, integrating the no-format option from my new function into get_result_as_array
caused a whole slew of test failures, so I just moved it into to_native_types
instead.
f8c657c
to
06f99c1
Compare
Current coverage is 84.23%@@ master #13418 diff @@
==========================================
Files 138 138
Lines 50929 50813 -116
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 42925 42803 -122
- Misses 8004 8010 +6
Partials 0 0
|
06f99c1
to
3f212e4
Compare
@jreback : Moved |
@@ -1,4 +1,8 @@ | |||
# -*- coding: utf-8 -*- | |||
""" | |||
Internal module for formatting output data (e.g. `to_csv`). | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a very helpful comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I add exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the to_csv
(or add that it provides formatters for csv, html, latex and well as display formatting). It doesn't support to_csv
, rather the implemenation for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Done.
3f212e4
to
0602b2f
Compare
@jreback : Modified the relevant doc-strings and comments as requested. Ready to merge if there are no other concerns. |
'c_bool': [True, False], | ||
}) | ||
|
||
expected = """\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit, you can use
expected = textwrap.dedent(
"""\
,c_bool,c_float,c_int,c_string
0,True,1.0,42.0,a
1,False,3.2,,"b,c"
"""
)
to avoid the odd indentation. Not worth holding up a merge though, if you aren't able to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd indentation is the MO it seems for anything CSV-related. I'll leave for now unless there are really strong objections. But thanks for mentioning though @TomAugspurger !
1b16693
to
964f343
Compare
Float values were being quoted despite the quoting spec. Bug traced to the float formatting that was unconditionally casting all floats to string. Unconditional casting traced back to commit 2d51b33 (pandas-devgh-12194) via bisection. This commit undoes some of those changes to rectify the behaviour. Closes pandas-devgh-12922. [ci skip]
964f343
to
8e53112
Compare
ty |
Closes #12922: "bug" traced to #12194 via bisection, where the float formatting was unconditionally casting everything to string.
I say "bug" (with quotations) because the changes to
get_result_as_array
did correctly cast everything to string as per the documentation (i.e. it had inadvertently patched a bug itself even though it was just a cleaning PR). However, the changes had overlooked the impact it would have onto_csv
.