-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: to_json()/read_json()
can't correctly dump/load numbers requiring >15 digits of precision
#38437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @mjuric for the detailed report! Not sure this is an easy fix ... a naive (and probably not smart even if it worked) hope of just changing the header definition of |
Looks like this update from |
Thanks @mzeitlin11 for looking into this! I see your point about the update being non-trivial (and requiring deprecations) :(. Could a temporary workaround may be to document the current behavior? It took me awhile to track down that the loss of precision on read from JSON is coming from pandas -- that isn't documented right now. Setting |
Definitely! A PR to document this limitation would be welcome. |
I encountered a bug with large truncation error:
output: pandas version: 1.4.2/1.3.4 |
Uh oh!
There was an error while loading. Please reload this page.
Code Sample, a copy-pastable example
Demonstration of the serialization issue:
Output:
Demonstration that deserialization silently disregards the last digit:
Output:
Problem description
64-bit floating point numbers require up to 17 decimal digits to be fully round-tripped to textual representation and back (e.g., see https://stackoverflow.com/questions/6118231/why-do-i-need-17-significant-digits-and-not-16-to-represent-a-double/). Pandas' ujson-based decoder cuts them off at 15 digits, causing loss of precision. This introduces inconsistencies when a pandas dataframe is transmitted from point A to point B via different serializations vs. when it's not (e.g., in our case, this issue cropped up while validating a REST API for a near-Earth asteroid orbit computation service).
I traced this down to an old version of
ultrajsonenc.c
that's been imported to Pandas code and forces this cut. Modern versions don't seem to have this limitation (and do away withdouble_precision
argument toujson.dump
alltogether) -- e.g., see here.Expected Output
Modern
ujson
seems to handle this fine, keeping the required precision:A solution may be to update the version shipped with Pandas.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : b5958ee
python : 3.8.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.1
setuptools : 49.6.0.post20201009
Cython : None
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: