-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
json_normalize Support for Generators #26647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the issue can be expressed more succinctly. Basically this works: In [22]: data = [{'id': 1, 'name': {'first': 'foo'}}, {'id': 2, 'name': {'first': 'bar'}}]
In [23]: json_normalize(data)
Out[23]:
id name.first
0 1 foo
1 2 bar But this doesn't In [23]: json_normalize(x for x in data)
Out[24]:
id name.first
0 2 bar I think this could be supported |
If you'd like to try a PR to support this I think would be OK |
What would be preferable? Raising an error or turning data into a list if data is an iterator? The latter is pretty straightforward. My (naive) solution would be the following:
|
I don't think using |
closed by #33585 |
Note #33585 does not add support for generators, but clarifies the docstrings and raises an exception if not list (which is great, this is not intend as a complaint), Has the team decided it is undesirable to allow generators? Would something like the following be a reasonable approach?
Where |
The data argument in json_normalize requires be an dictionary or list of dictionaries. If an iterator is passed, unexpected behavior can occur. In particular, this will result in the loss of the first row. See https://stackoverflow.com/questions/56362810/missing-first-document-when-loading-multi-document-yaml-file-in-pandas-dataframe.
I think this should be catched, either by throwing an error (as proposed in #26646) or by ensuring that the output for the iterator is the same as for the list created by "materializing" the iterator.
The text was updated successfully, but these errors were encountered: