Skip to content

Fix NaN bug in ujson #46628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ I/O
- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`)
- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`)
- :meth:`to_html` now excludes the ``border`` attribute from ``<table>`` elements when ``border`` keyword is set to ``False``.
-
- Bug in :func:`read_json` reading ``NaN`` values in corner cases (:issue:`46627`)

Period
^^^^^^
Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/src/ujson/lib/ultrajson.h
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ enum JSTYPES {
JT_ARRAY, // Array structure
JT_OBJECT, // Key/Value structure
JT_INVALID, // Internal, do not return nor expect
JT_NAN, // Not A Number
JT_POS_INF, // Positive infinity
JT_NEG_INF, // Negative infinity
};
Expand Down Expand Up @@ -289,6 +290,7 @@ typedef struct __JSONObjectDecoder {
JSOBJ (*newTrue)(void *prv);
JSOBJ (*newFalse)(void *prv);
JSOBJ (*newNull)(void *prv);
JSOBJ (*newNaN)(void *prv);
JSOBJ (*newPosInf)(void *prv);
JSOBJ (*newNegInf)(void *prv);
JSOBJ (*newObject)(void *prv, void *decoder);
Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/src/ujson/lib/ultrajsondec.c
Original file line number Diff line number Diff line change
Expand Up @@ -293,9 +293,9 @@ JSOBJ FASTCALL_MSVC decode_numeric(struct DecoderState *ds) {
if (*(offset++) != 'a') goto SET_NAN_ERROR;
if (*(offset++) != 'N') goto SET_NAN_ERROR;

ds->lastType = JT_NULL;
ds->lastType = JT_NAN;
ds->start = offset;
return ds->dec->newNull(ds->prv);
return ds->dec->newNaN(ds->prv);

SET_NAN_ERROR:
return SetError(ds, -1, "Unexpected character found when decoding 'NaN'");
Expand Down
3 changes: 3 additions & 0 deletions pandas/_libs/src/ujson/python/JSONtoObj.c
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,8 @@ JSOBJ Object_newFalse(void *prv) { Py_RETURN_FALSE; }

JSOBJ Object_newNull(void *prv) { Py_RETURN_NONE; }

JSOBJ Object_newNaN(void *prv) { return PyFloat_FromDouble(Py_NAN); }

JSOBJ Object_newPosInf(void *prv) { return PyFloat_FromDouble(Py_HUGE_VAL); }

JSOBJ Object_newNegInf(void *prv) { return PyFloat_FromDouble(-Py_HUGE_VAL); }
Expand Down Expand Up @@ -510,6 +512,7 @@ PyObject *JSONToObj(PyObject *self, PyObject *args, PyObject *kwargs) {
JSONObjectDecoder dec = {
Object_newString, Object_objectAddKey, Object_arrayAddItem,
Object_newTrue, Object_newFalse, Object_newNull,
Object_newNaN,
Object_newPosInf, Object_newNegInf, Object_newObject,
Object_endObject, Object_newArray, Object_endArray,
Object_newInteger, Object_newLong, Object_newUnsignedLong,
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -1326,6 +1326,16 @@ def test_read_json_large_numbers2(self):
expected = DataFrame(1.404366e21, index=["articleId"], columns=[0])
tm.assert_frame_equal(result, expected)

def test_read_json_nans(self, nulls_fixture, request):
# GH 46627
json = StringIO('[NaN, {}, null, 1]')
result = read_json(json)
assert result.iloc[0, 0] is not None # used to return None here
assert np.isnan(result.iloc[0, 0])
assert result.iloc[1, 0] == {}
assert result.iloc[2, 0] is None
assert result.iloc[3, 0] == 1

def test_to_jsonl(self):
# GH9180
df = DataFrame([[1, 2], [1, 2]], columns=["a", "b"])
Expand Down
3 changes: 3 additions & 0 deletions pandas/tests/io/json/test_ujson.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,9 @@ def test_encode_time_conversion_dateutil(self):
def test_encode_as_null(self, decoded_input):
assert ujson.encode(decoded_input) == "null", "Expected null"

def test_decode_nan(self, decoded_input):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a test in one of the other modules in this same directory? I think this module was copied directly from ujson but is detached from what an end user would see, as no end user really imports ujson through pandas.

Would be a better test if we can re-create how an end user would come to this through one of the pandas objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test in pandas.tests.io.json.test_pandas that uses the MWE from the bug report. I think this test should stay here as well, as a similar test now exists in ujson itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could consider "re-vendoring" ujson now that this bug is fixed in it's main branch as well. But I'm not sure if pandas has other modifications or not (and if their are, they probably should be backported anyway).

assert math.isnan(ujson.dumps("[NaN]")[0])

def test_datetime_units(self):
val = datetime.datetime(2013, 8, 17, 21, 17, 12, 215504)
stamp = Timestamp(val)
Expand Down