Skip to content

BUG: Fix csv.QUOTE_NONNUMERIC quoting in to_csv #13418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jun 10, 2016

Closes #12922: "bug" traced to #12194 via bisection, where the float formatting was unconditionally casting everything to string.

I say "bug" (with quotations) because the changes to get_result_as_array did correctly cast everything to string as per the documentation (i.e. it had inadvertently patched a bug itself even though it was just a cleaning PR). However, the changes had overlooked the impact it would have on to_csv.

@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch from ed75bfb to f8c657c Compare June 10, 2016 05:23
@@ -2138,6 +2016,31 @@ def formatter(value):

return formatter

def _get_formatted_data(self):
"""
Returns formatted float values. This is different from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you making this a separate function ? the point was to eliminate this kind of thing;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my reasoning was made clear in the doc-string. get_result_as_array returns only strings (it supposed to be doing this for quite some time). However, formatting CSV output is not always string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point you refer to was well-intended but ultimately incorrect, and sure, we could add this to get_result_as_array, but that's an API change then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how so? it's an internal routine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_result_as_array - I wouldn't know that from the signature (no preceding _)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these entire module is internal - in reality none should be leading _

we have very inconsistent usage

Copy link
Member Author

@gfyoung gfyoung Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Fair enough, then a note should be made about that in the module doc-string. I'll move this functionality into get_result_as_array then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine

Copy link
Member Author

@gfyoung gfyoung Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note to format.py - unfortunately, integrating the no-format option from my new function into get_result_as_array caused a whole slew of test failures, so I just moved it into to_native_types instead.

@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch from f8c657c to 06f99c1 Compare June 10, 2016 06:01
@codecov-io
Copy link

codecov-io commented Jun 10, 2016

Current coverage is 84.23%

Merging #13418 into master will decrease coverage by 0.04%

@@             master     #13418   diff @@
==========================================
  Files           138        138          
  Lines         50929      50813   -116   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42925      42803   -122   
- Misses         8004       8010     +6   
  Partials          0          0          

Powered by Codecov. Last updated by f752886...3f212e4

@gfyoung gfyoung changed the title BUG, CLN: Clean and fix to_csv BUG: Fix quoting in to_csv Jun 10, 2016
@gfyoung gfyoung changed the title BUG: Fix quoting in to_csv BUG: Fix csv.QUOTE_NONNUMERIC quoting in to_csv Jun 10, 2016
@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch from 06f99c1 to 3f212e4 Compare June 10, 2016 06:41
@gfyoung
Copy link
Member Author

gfyoung commented Jun 10, 2016

@jreback : Moved engine parameter removal into separate PR and removed the _get_formatted_data function and move its code in to_native_types. Travis is still happy, so ready to merge if there are no other concerns.

@jreback jreback added the IO CSV read_csv, to_csv label Jun 10, 2016
@@ -1,4 +1,8 @@
# -*- coding: utf-8 -*-
"""
Internal module for formatting output data (e.g. `to_csv`).
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a very helpful comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I add exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the to_csv (or add that it provides formatters for csv, html, latex and well as display formatting). It doesn't support to_csv, rather the implemenation for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Done.

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Jun 10, 2016
@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch from 3f212e4 to 0602b2f Compare June 10, 2016 13:18
@gfyoung
Copy link
Member Author

gfyoung commented Jun 11, 2016

@jreback : Modified the relevant doc-strings and comments as requested. Ready to merge if there are no other concerns.

'c_bool': [True, False],
})

expected = """\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit, you can use

expected = textwrap.dedent(
    """\
    ,c_bool,c_float,c_int,c_string
    0,True,1.0,42.0,a
    1,False,3.2,,"b,c"
    """
)

to avoid the odd indentation. Not worth holding up a merge though, if you aren't able to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd indentation is the MO it seems for anything CSV-related. I'll leave for now unless there are really strong objections. But thanks for mentioning though @TomAugspurger !

@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch 2 times, most recently from 1b16693 to 964f343 Compare June 15, 2016 09:30
@jreback jreback added this to the 0.18.2 milestone Jun 15, 2016
Float values were being quoted despite the quoting spec.
Bug traced to the float formatting that was unconditionally
casting all floats to string. Unconditional casting traced
back to commit 2d51b33 (pandas-devgh-12194) via bisection. This commit
undoes some of those changes to rectify the behaviour.

Closes pandas-devgh-12922.

[ci skip]
@gfyoung gfyoung force-pushed the to-csv-quote-bugfix branch from 964f343 to 8e53112 Compare June 15, 2016 15:50
@jreback jreback closed this in d814f43 Jun 16, 2016
@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants