You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data file at question cannot be shared. When loaded successfully in the REPL, the produced dataframe (df.dtypes) has: Length: 72, dtype: object. Full dtype list:
Hey @jreback, I understand this issue isn't much use without a reproducible example. Unfortunately, I won't be able to provide one, as I can't really speculate on what the features of the file are that are causing the issue without guessing myself. I figured it would be good to share to (1) document that there is at least some subset of valid json files that causes a seg fault in the current implementation that (2) functions differently in the REPL vs the Python interpreter and to (3) provide a GDB stack trace to assist in future debugging if a similar issue is raised in the future. If I'm wrong on those counts, feel free to delete the issue wholesale.
Code Sample
Output:
Segmentation fault
Problem description
Similar issue to #11344, with a 1.2G file (specifically 1216272 bytes) with 70+ keys across the JSON records and multiple nested keys.
GDB output:
Bizarrely, the seg fault doesn't occur in the
ipython
REPL e.g. executing the script with%load produce_site_report.py
is just fine.Expected Output
No
Segmentation fault
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.18.7.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Data File
The data file at question cannot be shared. When loaded successfully in the REPL, the produced dataframe (
df.dtypes
) has:Length: 72, dtype: object
. Full dtype list:The text was updated successfully, but these errors were encountered: