BUG: `to_json()/read_json()` can't correctly dump/load numbers requiring >15 digits of precision #38437

mjuric · 2020-12-13T01:22:40Z

Code Sample, a copy-pastable example

Demonstration of the serialization issue:

import pandas as pd

df = pd.DataFrame([0.9884619112598676])
js = df.to_json(double_precision=15)   # raises exception if double_precision>=16 is set

print(f"orig:           { df[0][0]} ({ df[0].dtypes})")
print(f"JSON: {js}")

Output:

orig:           0.9884619112598676 (float64)
JSON: {"0":{"0":0.988461911259868}}

Demonstration that deserialization silently disregards the last digit:

import numpy as np

js = '{"0":{"0":0.9884619112598676}}'
df = pd.read_json(js)
flt = np.float64("0.9884619112598676")
print(f"  JSON: {js}")
print(f" numpy:           {flt}")
print(f"Pandas:           {df[0][0]}  ({df[0].dtypes})")

Output:

  JSON: {"0":{"0":0.9884619112598676}}
 numpy:           0.9884619112598676
Pandas:           0.988461911259867  (float64)

Problem description

64-bit floating point numbers require up to 17 decimal digits to be fully round-tripped to textual representation and back (e.g., see https://stackoverflow.com/questions/6118231/why-do-i-need-17-significant-digits-and-not-16-to-represent-a-double/). Pandas' ujson-based decoder cuts them off at 15 digits, causing loss of precision. This introduces inconsistencies when a pandas dataframe is transmitted from point A to point B via different serializations vs. when it's not (e.g., in our case, this issue cropped up while validating a REST API for a near-Earth asteroid orbit computation service).

I traced this down to an old version of ultrajsonenc.c that's been imported to Pandas code and forces this cut. Modern versions don't seem to have this limitation (and do away with double_precision argument to ujson.dump alltogether) -- e.g., see here.

Expected Output

Modern ujson seems to handle this fine, keeping the required precision:

import ujson
print(ujson.__version__)
print(ujson.dumps(0.9884619112598676))

>>> 4.0.1
>>> 0.9884619112598676

A solution may be to update the version shipped with Pandas.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : b5958ee
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.5
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.1
setuptools : 49.6.0.post20201009
Cython : None
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

mzeitlin11 · 2020-12-24T01:18:13Z

Thanks @mjuric for the detailed report!

Not sure this is an easy fix ... a naive (and probably not smart even if it worked) hope of just changing the header definition of JSON_DOUBLE_MAX_DECIMALS to 17 does not fix this issue. Pulling over an updated version of ujson seems reasonable, but would require reintegration of pandas-specific changes to the code. Plus any API change required by the updated ujson version would require deprecations.

mzeitlin11 · 2020-12-24T01:31:27Z

Looks like this update from ujson comes from doing conversion with https://github.com/google/double-conversion, so I think using updated ujson code would also require another dependency.

mjuric · 2021-01-04T22:56:48Z

Thanks @mzeitlin11 for looking into this!

I see your point about the update being non-trivial (and requiring deprecations) :(.

Could a temporary workaround may be to document the current behavior? It took me awhile to track down that the loss of precision on read from JSON is coming from pandas -- that isn't documented right now. Setting double_precision too high at least throws an exception.

mzeitlin11 · 2021-01-17T04:43:51Z

Definitely! A PR to document this limitation would be welcome.

sappersapper · 2022-09-03T05:53:45Z

I encountered a bug with large truncation error:

import pandas as pd

pd.Series([3.1415926535897933e-15]).to_json(double_precision=15)

output:
'{"0":0.000000000000003}'

pandas version: 1.4.2/1.3.4

mjuric added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 13, 2020

mzeitlin11 added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 24, 2020

mzeitlin11 added this to the Contributions Welcome milestone Dec 24, 2020

mzeitlin11 added Warnings Warnings that appear or should be added to pandas and removed Bug labels Jan 17, 2021

mzeitlin11 added Docs and removed Warnings Warnings that appear or should be added to pandas labels Jan 24, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

phofl mentioned this issue Apr 17, 2023

Small documentation improvements noatamir/pyladies-workshop#6

Open

12 tasks

natmokval mentioned this issue Jul 12, 2023

DOC: point out the limitation of precision while doing serialization #54100

Merged

1 task

MarcoGorelli closed this as completed in #54100 Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: `to_json()/read_json()` can't correctly dump/load numbers requiring >15 digits of precision #38437

BUG: `to_json()/read_json()` can't correctly dump/load numbers requiring >15 digits of precision #38437

mjuric commented Dec 13, 2020 •

edited

Loading

INSTALLED VERSIONS

mzeitlin11 commented Dec 24, 2020

Uh oh!

mzeitlin11 commented Dec 24, 2020

Uh oh!

mjuric commented Jan 4, 2021

Uh oh!

mzeitlin11 commented Jan 17, 2021 •

edited

Loading

Uh oh!

sappersapper commented Sep 3, 2022 •

edited

Loading

Uh oh!

Uh oh!

BUG: to_json()/read_json() can't correctly dump/load numbers requiring >15 digits of precision #38437

BUG: to_json()/read_json() can't correctly dump/load numbers requiring >15 digits of precision #38437

Comments

mjuric commented Dec 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

mzeitlin11 commented Dec 24, 2020

Uh oh!

mzeitlin11 commented Dec 24, 2020

Uh oh!

mjuric commented Jan 4, 2021

Uh oh!

mzeitlin11 commented Jan 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sappersapper commented Sep 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BUG: `to_json()/read_json()` can't correctly dump/load numbers requiring >15 digits of precision #38437

BUG: `to_json()/read_json()` can't correctly dump/load numbers requiring >15 digits of precision #38437

mjuric commented Dec 13, 2020 •

edited

Loading

Output of `pd.show_versions()`

mzeitlin11 commented Jan 17, 2021 •

edited

Loading

sappersapper commented Sep 3, 2022 •

edited

Loading