Skip to content

Fix NaN and Inf serialization #706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions jupyter_client/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# Distributed under the terms of the Modified BSD License.
import hashlib
import hmac
import json
import logging
import os
import pickle
Expand All @@ -27,6 +26,7 @@
compare_digest,
) # We are using compare_digest to limit the surface of timing attacks

import simplejson as json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the added dependency? I think we should be able to do this with stdlib json, since we always have before.

Copy link
Member Author

@martinRenou martinRenou Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we had json_clean in ipykernel that was going through all messages (making copies of lists and dicts) and replacing math.inf and math.nan manually.

Unfortunately the standard json lib only allows two options:

  • allow_nan=True: nan is serialized to NaN, inf is serialized to Infinity, which are both valid JS but not valid JSON
  • allow_nan=False: throw an exception for nan and inf without letting us a chance to serialize them to null (valid JSON) in the default function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg, that's frustrating. And support for nan/inf is required somewhere? This is an artifact of earlier "never raise" goals, but I think maybe something that can't actually be serialized should raise. What breaks if these raise?

Copy link
Member Author

@martinRenou martinRenou Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both bqplot and pythreejs are failing now due to this. Arguably, those libraries should maybe provide valid JSON.

Edit:, for example, ipydatagrid has special serializers for those: https://github.com/bloomberg/ipydatagrid/blob/main/ipydatagrid/datagrid.py#L172-L178

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplejson does the wrong thing for bytes:

In [2]: simplejson.dumps({'buf': b'bytes',}, default=repr)
Out[2]: '{"buf": "bytes"}'

we should have a test that verifies this, because the switch to simplejson should have caused tests to fail.

Can you add

        ({"key": b"\xFF"}, {"key": "/w==\n"}),

to the list of pairs in test_json_default? And make sure the same json implementation is used?

I think the bytes behavior is more important than the unsupported floats. matplotlib figures rely on it, for one.

What I think is the ideal behavior is to trigger a DeprecationWarning when these values are seen, but still coerce to null. I don't see a good way to do either with simplejson or stdlib json, though.

If supporting these values is deprecated, maybe the thing to do is to fallback on the old inefficient path:

try:
    dumps(obj, default=json_default)
except ValueError as e:
    # maybe double check that it's the nan/inf error
    # DeprecationWarning about inf/nan support
    obj = json_clean(obj)
    return dumps(obj, default=json_default)

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do that. Thanks!

Let's close this PR. I will add an extra test here like you suggested, and push a fix to ipykernel for the fallback to json_clean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and push a fix to ipykernel

Actually, should we do this fallback to json_clean in jupyter_client? It's jupyter_client that does the dumps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, should we do this fallback to json_clean in jupyter_client? It's jupyter_client that does the dumps

Yeah, I think doing it here makes sense. I'm not sure ipykernel can add a catch in the right place.

import zmq
from traitlets import Any # type: ignore
from traitlets import Bool
Expand Down Expand Up @@ -96,7 +96,7 @@ def json_packer(obj):
obj,
default=json_default,
ensure_ascii=False,
allow_nan=False,
ignore_nan=True,
).encode("utf8")


Expand Down
8 changes: 8 additions & 0 deletions jupyter_client/tests/test_jsonutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# Distributed under the terms of the Modified BSD License.
import datetime
import json
import math
import numbers
from datetime import timedelta
from unittest import mock
Expand All @@ -12,6 +13,7 @@
from dateutil.tz import tzoffset

from jupyter_client import jsonutil
from jupyter_client.session import json_packer
from jupyter_client.session import utcnow

REFERENCE_DATETIME = datetime.datetime(2013, 7, 3, 16, 34, 52, 249482, tzlocal())
Expand Down Expand Up @@ -128,3 +130,9 @@ def test_json_default():
out = json.loads(json.dumps(val, default=jsonutil.json_default))
# validate our cleanup
assert out == jval


def test_json_packer():
assert json_packer(math.nan) == b'null'
assert json_packer(math.inf) == b'null'
assert json_packer(-math.inf) == b'null'
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ jupyter_core>=4.6.0
nest-asyncio>=1.5
python-dateutil>=2.1
pyzmq>=13
simplejson
tornado>=4.1
traitlets