Skip to content

BUG: GH11344 in pandas.json when file to read is big #11393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 23, 2015

Conversation

kawochen
Copy link
Contributor

closes #11344
taken from ultrajson/ultrajson#145

test code

import json
import pandas
from pandas.compat import zip, range
SIZE = 5*10**7
FILENAME = 'generated.json'
with open(FILENAME, 'w') as fileh:
    json.dump(dict(zip(range(SIZE), range(SIZE))), fileh)
with open(FILENAME) as fileh:
    pandas.json.load(fileh)

@jreback
Copy link
Contributor

jreback commented Oct 21, 2015

note that Travis is borked ATM

@jreback
Copy link
Contributor

jreback commented Oct 21, 2015

@kawochen repush, Travis is fixed.

easy way to put in a test for this? (you can mark it slow, but still shouldn't take forever, or blow up memory, not sure if that's possible as Travis has a limited amount)

@jreback jreback added IO JSON read_json, to_json, json_normalize Compat pandas objects compatability with Numpy or Python functions labels Oct 21, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 21, 2015
@kawochen
Copy link
Contributor Author

sorry about the noise - I had pushed again precisely when Travis was throwing another tantrum.
I can't think of a good way to write a test, but locally, you could change SIZE_MAX to like 10 and check that even if that path is taken, there will be no seg fault. That matters for 32-bit environments way more than for 64-bit.
But that path shouldn't have been taken in OP's case.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2015

@kawochen yeh, let me give this a try on windows.

can you put the test case generation code here for refernce

@kawochen
Copy link
Contributor Author

added to top

@jreback
Copy link
Contributor

jreback commented Oct 23, 2015

ok dind't crash windows (though didn't finish as don't have a lot of memory and was swapping)...oh well

jreback added a commit that referenced this pull request Oct 23, 2015
BUG: GH11344 in pandas.json when file to read is big
@jreback jreback merged commit 9a6de4e into pandas-dev:master Oct 23, 2015
@jreback
Copy link
Contributor

jreback commented Oct 23, 2015

thanks for the fix @kawochen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault when trying to load large json file
2 participants