read_json segfaults with Python 3.7 #22817

hmgaudecker · 2018-09-24T10:22:05Z

Code Sample, a copy-pastable example if possible

Python 3.7.0 (default, Jun 28 2018, 13:15:42) 
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.23.4'
>>> pd.read_json(
...     '''{
...         "tsk_buy_sell": "Wie häufig kommt Einkaufen / Beschaffen / Verkaufen bei Ihrer Arbeit vor?",
...         "tsk_consult": "Wie häufig kommt Beraten / Informieren bei Ihrer Arbeit vor?",
...         "tsk_hospitality": "Wie häufig kommt Bewirten / Beherbergen / Speisenbereiten bei Ihrer Arbeit vor?",
...         "tsk_information_search": "Wie häufig kommt Informationen sammeln / Recherchieren / Dokumentieren bei Ihrer Arbeit vor?",
...         "tsk_law": "Gesetze auslegen, Vorschriften anwenden",
...         "tsk_marketing_pr": "Werben, Marketing oeffentlichkeitsarbeit, PR",
...         "tsk_operate": "Wie häufig kommt Überwachen / Steuern von Maschinen / Anlagen / techn. Prozessen bei Ihrer Arbeit vor?",
...         "tsk_organise": "Wie häufig kommt Organisieren / Planen und Vorbereiten von Arbeitsprozessen bei Ihrer Arbeit vor?",
...         "tsk_persuade": "Wie häufig müssen Sie andere überzeugen und Kompromisse aushandeln?",
...         "tsk_produce": "Wie häufig kommt Herstellen / Produzieren von Waren und Gütern bei Ihrer Arbeit vor?",
...         "tsk_quality_check": "Wie häufig kommt Messen / Prüfen / Qualitätskontrolle bei Ihrer Arbeit vor?",
...         "tsk_repair": "Wie häufig kommt Reparieren / Instandsetzen bei Ihrer Arbeit vor?",
...         "tsk_research_construct": "Wie häufig kommt Entwickeln / Forschen / Konstruieren bei Ihrer Arbeit vor?",
...         "tsk_supervision": "Haben Sie Mitarbeiter und Mitarbeiterinnen, für die Sie direkte Vorgesetzte sind?",
...         "tsk_teach": "Wie häufig kommt Ausbilden / Lehren / Unterrichten / Erziehen bei Ihrer Arbeit vor?",
...         "tsk_means_of_transportation": "Arbeit mit Transportmitteln"
...     }''',
...     typ='series',
...     orient='index'
... )
tsk_buy_sell                   Wie häufig kommt Einkaufen / Beschaffen / Verk...
tsk_consult                    Wie häufig kommt Beraten / Informieren bei Ihr...
tsk_hospitality                Wie häufig kommt Bewirten / Beherbergen / Spei...
tsk_information_search         Wie häufig kommt Informationen sammeln / Reche...
tsk_law                                  Gesetze auslegen, Vorschriften anwenden
tsk_marketing_pr                    Werben, Marketing oeffentlichkeitsarbeit, PR
tsk_operate                    Wie häufig kommt Überwachen / Steuern von Masc...
tsk_organise                   Wie häufig kommt Organisieren / Planen und Vor...
tsk_persuade                   Wie häufig müssen Sie andere überzeugen und Ko...
tsk_produce                    Wie häufig kommt Herstellen / Produzieren von ...
tsk_quality_check              Wie häufig kommt Messen / Prüfen / Qualitätsko...
tsk_repair                     Wie häufig kommt Reparieren / Instandsetzen be...
tsk_research_construct         Wie häufig kommt Entwickeln / Forschen / Konst...
tsk_supervision                Haben Sie Mitarbeiter und Mitarbeiterinnen, fü...
tsk_teach                      Wie häufig kommt Ausbilden / Lehren / Unterric...
tsk_means_of_transportation                          Arbeit mit Transportmitteln
dtype: object
>>> 
Segmentation fault (core dumped)

Expected Output

Same thing, without the segfault. Everything works fine in a Python 3.6 conda environment.

Also noted by @bsolomon1124 in #11344

Output of `pd.show_versions()`

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

troels · 2018-09-25T14:41:56Z

Hi @hmgaudecker

I can replicate this, but the problem doesn't seem to be in the json-module. The following small program coredumps for me:

import pandas as pd

s = pd.Series(['Wie häufig kommt Beraten / Informieren bei Ihrer Arbeit vor?'], dtype='object')
s.astype('float64')

Small variations make it stop crashing. I believe something is writing to a stray pointer, perhaps inside of numpy. Likely somehow related to the ä-character, but I can't quite pinpoint how.

hmgaudecker · 2018-09-25T16:06:06Z

Great, thanks. So the problem is that read_json attempts a type conversion internally and only continues with the object dtype upon failure?

troels · 2018-09-25T19:35:21Z

Well, the problem is that somewhere on the way to detecting what type actually is in the columns, pandas or numpy fails in their memory handling. Either by writing outside an area of memory they have allocated or by e.g. freeing some memory that is still is being used. The bug is in C or Cython-code but not in the json-module.

My example code is just supposed to throw an exception, not dump core.

troels · 2018-09-27T20:05:07Z

This is in fact a cpython 3.7.0 bug in float() and will be fixed in python 3.7.1, which will be released in a week or two:

https://bugs.python.org/issue34087

WillAyd · 2019-04-15T16:30:16Z

Closing as this was a Python bug which has since been resolved

jbrockmendel added Compat pandas objects compatability with Numpy or Python functions IO JSON read_json, to_json, json_normalize labels Sep 29, 2018

WillAyd mentioned this issue Sep 30, 2018

pd.read_json produces a dataframe that causes segfaults #22909

Closed

jbrockmendel added the Segfault Non-Recoverable Error label Nov 9, 2018

WillAyd closed this as completed Apr 15, 2019

WillAyd added this to the No action milestone Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_json segfaults with Python 3.7 #22817

read_json segfaults with Python 3.7 #22817

hmgaudecker commented Sep 24, 2018

troels commented Sep 25, 2018

Uh oh!

hmgaudecker commented Sep 25, 2018

Uh oh!

troels commented Sep 25, 2018

Uh oh!

troels commented Sep 27, 2018

Uh oh!

WillAyd commented Apr 15, 2019

Uh oh!

Uh oh!

read_json segfaults with Python 3.7 #22817

read_json segfaults with Python 3.7 #22817

Comments

hmgaudecker commented Sep 24, 2018

Code Sample, a copy-pastable example if possible

Expected Output

Output of pd.show_versions()

troels commented Sep 25, 2018

Uh oh!

hmgaudecker commented Sep 25, 2018

Uh oh!

troels commented Sep 25, 2018

Uh oh!

troels commented Sep 27, 2018

Uh oh!

WillAyd commented Apr 15, 2019

Uh oh!

Output of `pd.show_versions()`