Skip to content

CSV Writer can not write strings with nulls when no escapechar is specified #97503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eshmu opened this issue Sep 23, 2022 · 13 comments
Closed
Labels
stdlib Python modules in the Lib dir triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error

Comments

@eshmu
Copy link

eshmu commented Sep 23, 2022

Bug report

This code fails in Python 3.10 with the need to escape, but no escapechar set error:

import csv

value = 'my\x00string'

with io.StringIO() as buf:
    writer = csv.writer(buf)
    writer.writerow([value])

However, this worked on Python 3.9 and earlier, and it should work since I shouldn't have to specify an escapechar when quoting is available.

I believe this is caused by the fix for this issue: #56387

See this comment on how the C snippet added for the fix introduced this bug: #56387 (comment)

@eshmu eshmu added the type-bug An unexpected behavior, bug, or error label Sep 23, 2022
@mdboom mdboom added stdlib Python modules in the Lib dir triaged The issue has been accepted as valid by a triager. labels Sep 23, 2022
@mumbleskates
Copy link

ran into this today and i can confirm. it's pretty hard to see why it happens, though if escapechar is set to 0 rather than the more sensible NOT_SET (uint32_t(-1)) that would certainly explain it. unset chars should definitely be using NOT_SET instead in all situations

@WillAyd
Copy link
Contributor

WillAyd commented Nov 6, 2022

This appears to be working on the tip of main

@eshmu
Copy link
Author

eshmu commented Nov 7, 2022

Yep, works with Python 3.11, it looks like. Closing.

@eshmu eshmu closed this as completed Nov 7, 2022
@mumbleskates
Copy link

this is not fixed in 3.10.7, and i feel that it should probably be fixed in that branch as it is a clear regression. is 3.10 going into security-only maintenance with this bug still extant?

@merwok
Copy link
Member

merwok commented Nov 8, 2022

If this is a clear bug (behaviour contradicts documentation) and you can post a reproducer, it could be fixed unless that would require too many changes or adding new parameters.

@mumbleskates
Copy link

@merwok

Python 3.10.7 (main, Nov  2 2022, 18:49:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from csv import writer
>>> from io import StringIO
>>> writer(StringIO()).writerow(["\0"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_csv.Error: need to escape, but no escapechar set
>>> 

tip of main:

DIASET(_set_char_or_none, "escapechar", &self->escapechar, escapechar, NOT_SET);

tip of 3.10 (through 3.10.8):
DIASET(_set_char_or_none, "escapechar", &self->escapechar, escapechar, 0);

the fallback value should be NOT_SET, not 0. this was fixed in 3.11 here: b454e8e#diff-38fcce6bb475616052f5c9a0973eefd49489a4dff719f30e407534258e2a3ec3R487

@merwok
Copy link
Member

merwok commented Nov 8, 2022

From the way the github PR was titled («Add support for») and the discussion on the bugs.python.org ticket (mention of backward compat issue with docutils), my understanding is that this change was deemed a new feature, not a bug fix, and as such will not be backported.

@mumbleskates
Copy link

Commit messages notwithstanding, this is a clear regression, as was observed in the original post of this issue:

Python 3.9.9 (main, Nov  7 2022, 21:30:33) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> import io
>>> csv.writer(io.StringIO()).writerow(["\0"])
5
>>> 
Python 3.8.12 (default, Nov  7 2022, 21:31:36) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> import io
>>> csv.writer(io.StringIO()).writerow(["\0"])
5
>>> 
Python 3.7.12 (default, Nov  7 2022, 21:34:47) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> import io
>>> csv.writer(io.StringIO()).writerow(["\0"])
5
>>> 
Python 3.7.0 (default, Nov  7 2022, 21:59:14) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> import io
>>> csv.writer(io.StringIO()).writerow(["\0"])
5
>>>
Python 3.6.15 (default, Nov  7 2022, 22:06:34) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> import io
>>> csv.writer(io.StringIO()).writerow(["\0"])
5
>>>

All the prior versions of python3 I can get my hands on at the moment write b'"\0"\r\n', and python3.11 writes b"\0\r\n". Both of these options are permissible and coherent, but python3.10 fails by reporting that an escape character is required when escaping is not needed: the only characters that get escaped in python's conception of CSV are the escape character and the quote character, and python's dialects use double-quote mode to use the quote character to escape itself.

@serhiy-storchaka
Copy link
Member

What is the result of writing strings like '"\0' or '\n\0' in different Python versions?

@mumbleskates
Copy link

with a script like

import csv
import io
b = io.StringIO()
csv.writer(b).writerow(["\0", "\"\0", "\n\0", "\0\r\n"])
print(repr(b.getvalue()))

the results are:

3.6.15
'"\x00","""\x00","\n\x00","\x00\r\n"\r\n'

3.7.0
'"\x00","""\x00","\n\x00","\x00\r\n"\r\n'

3.7.12
'"\x00","""\x00","\n\x00","\x00\r\n"\r\n'

3.8.12
'"\x00","""\x00","\n\x00","\x00\r\n"\r\n'

3.9.9
'"\x00","""\x00","\n\x00","\x00\r\n"\r\n'

3.10.1
Traceback (most recent call last):
  File "/home/widders/csvtest.py", line 7, in <module>
    csv.writer(b).writerow(["\0", "\"\0", "\n\0", "\0\r\n"])
_csv.Error: need to escape, but no escapechar set

3.11.0
'\x00,"""\x00","\n\x00","\x00\r\n"\r\n'

@serhiy-storchaka
Copy link
Member

After looking in the code I agree that it is a consequence of #56387. But while Python 3.9 and earlier allowed to write strings with the NUL character, the result could not be read back by the csv module. It produced an error _csv.Error: line contains NUL .

So I would not call it a clear regression. It was rather a fix of a bug. Now you get an error earlier, when write CSV, not when read it.

@piskvorky
Copy link

piskvorky commented Mar 6, 2024

Bitten by this regression too.

So I would not call it a clear regression. It was rather a fix of a bug. Now you get an error earlier, when write CSV, not when read it.

This "fix" would have made sense if the exception had a sensible error message, such as _csv.Error: line contains NUL, to match the error on reading. And if it didn't disappear again in Python 3.11.

As it is, _csv.Error: need to escape, but no escapechar set is misleading (a wasted debug effort for developers), on top of being a weird quirky regression in Python 3.10.

I applaud @mumbleskates for his patience, in the face of the responses here.

@mumbleskates
Copy link

thanks; i do think that this is (was?) still a regression, if pedantically so: while 3.9 could not successfully read nul characters, it could still write them... and there are other csv readers in the wild that one might be interested in sending the encoded data to, which is what i was doing.

for posterity, in my case i ended up being able to use JSONL with ujson, which writes much faster than CSV anyway even though it is larger (and the encoding step was my bottleneck). (this also uncovered a horrible memory leak in orjson which i have yet to chase down)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants