-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
API: read_csv, to_csv line_terminator keyword inconsistency #35399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
30c9b83
2f25460
572363a
b891030
ecd8ce3
ee55191
292fcdc
9e4ac71
1d0ba61
b59831e
b954874
ac0a7f1
bc55716
ee69a76
4d00fea
c015da5
73d6d11
1a6497f
3a88ef0
7fe8274
1c27b2c
1912aa2
f54df81
cea28d8
85ddf44
a28657c
2b1333f
0786617
5e87bbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3010,6 +3010,7 @@ def to_csv( | |
escapechar: Optional[str] = None, | ||
decimal: Optional[str] = ".", | ||
errors: str = "strict", | ||
**kwargs, | ||
) -> Optional[str]: | ||
r""" | ||
Write object to a comma-separated values (csv) file. | ||
|
@@ -3108,8 +3109,12 @@ def to_csv( | |
Specifies how encoding and decoding errors are to be handled. | ||
See the errors argument for :func:`open` for a full list | ||
of options. | ||
kwargs | ||
Additional keyword arguments passed to read_csv for compatibility | ||
with `csv` module. Include lineterminator (an alternative to | ||
line_terminator: see above). | ||
|
||
.. versionadded:: 1.1.0 | ||
.. versionadded:: 1.2.0 | ||
|
||
Returns | ||
------- | ||
|
@@ -3141,6 +3146,9 @@ def to_csv( | |
|
||
from pandas.io.formats.csvs import CSVFormatter | ||
|
||
kwargs.setdefault("lineterminator", line_terminator) | ||
line_terminator = line_terminator or kwargs["lineterminator"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to watch out for corner case where user passes both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok! Do we throw an error or would we use one (maybe |
||
|
||
formatter = CSVFormatter( | ||
df, | ||
path_or_buf, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -346,6 +346,12 @@ | |
values. The options are `None` for the ordinary converter, | ||
`high` for the high-precision converter, and `round_trip` for the | ||
round-trip converter. | ||
kwargs | ||
Additional keyword arguments passed to read_csv for compatibility | ||
with `csv` module. Include lineterminator (an alternative to | ||
line_terminator: see above). | ||
|
||
.. versionadded:: 1.2.0 | ||
|
||
Returns | ||
------- | ||
|
@@ -579,7 +585,7 @@ def read_csv( | |
compression="infer", | ||
thousands=None, | ||
decimal: str = ".", | ||
lineterminator=None, | ||
line_terminator=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again, in the signature, I think we should only have the one parameter and the compatibility keyword accepted though **kwargs |
||
quotechar='"', | ||
quoting=csv.QUOTE_MINIMAL, | ||
doublequote=True, | ||
|
@@ -595,6 +601,7 @@ def read_csv( | |
low_memory=_c_parser_defaults["low_memory"], | ||
memory_map=False, | ||
float_precision=None, | ||
**kwargs, | ||
): | ||
# gh-23761 | ||
# | ||
|
@@ -632,6 +639,8 @@ def read_csv( | |
engine = "c" | ||
engine_specified = False | ||
|
||
kwargs.setdefault("lineterminator", line_terminator) | ||
|
||
kwds.update( | ||
delimiter=delimiter, | ||
engine=engine, | ||
|
@@ -643,7 +652,6 @@ def read_csv( | |
quotechar=quotechar, | ||
quoting=quoting, | ||
skipinitialspace=skipinitialspace, | ||
lineterminator=lineterminator, | ||
header=header, | ||
index_col=index_col, | ||
names=names, | ||
|
@@ -681,6 +689,7 @@ def read_csv( | |
mangle_dupe_cols=mangle_dupe_cols, | ||
infer_datetime_format=infer_datetime_format, | ||
skip_blank_lines=skip_blank_lines, | ||
**kwargs, | ||
) | ||
|
||
return _read(filepath_or_buffer, kwds) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that @simonjayhawkins and I have a different point of view here but I am strongly against adding kwargs to read_csv. The signature is already massive and adding this only makes things worse
I think we either stick to adding just one keyword arg or just leave this for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reason i'm not keen on adding the alias to the signature is that we also have doublequote, escapechar, quotechar and skipinitialspace as parameters to read_csv. So if we have both line_terminator and lineterminator, some bright spark will want snake case equivalents of the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you guys be happy with @jbrockmendel solution? I think it doesn't add kwargs and
lineterminator
won't appear in the signatureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decorator would be a good solution, however some of our decorators 'lose' the signature in the docs e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_stata.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to keep both ad infinitum, it may be possible to mostly use
deprecate_kwarg
and just eat the warning.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins The "solution" to that issue is probably for the wrapper to rewrite the docstring with an explicit signature that can be created using
Signature
within the docstring. For example, after post processing the docstring forwould become
Of course, this would be for a different PR. The string variant is the method used to document the signature of Cython functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins I tried adding this one to
read_csv
and compiling. Seems ok but I can't say how robust that is