-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Robustness improvement for normalize.py #26328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
113e415
9715c95
c51ec4a
b6ea15c
dabf625
7b64d1b
b90b33f
d0dd982
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,6 +111,8 @@ def json_normalize(data, record_path=None, meta=None, | |
record_path : string or list of strings, default None | ||
Path in each object to list of records. If not passed, data will be | ||
assumed to be an array of records | ||
For an array of objects with missing key-value pairs in each record, | ||
the first record needs to include all key-value pairs | ||
meta : list of paths (string or list of strings), default None | ||
Fields to use as metadata for each record in resulting table | ||
meta_prefix : string, default None | ||
|
@@ -180,13 +182,21 @@ def json_normalize(data, record_path=None, meta=None, | |
0 1 | ||
1 2 | ||
""" | ||
|
||
def _pull_field(js, spec): | ||
result = js | ||
if isinstance(spec, list): | ||
for field in spec: | ||
result = result[field] | ||
else: | ||
result = result[spec] | ||
# GH26284 | ||
try: | ||
result = result[spec] | ||
if not (isinstance(result, list)): | ||
# Allows import of single objects into dataframe GH26284 | ||
result = [result] | ||
except KeyError: | ||
result = {} | ||
|
||
return result | ||
|
||
|
@@ -241,6 +251,12 @@ def _recursive_extract(data, path, seen_meta, level=0): | |
else: | ||
for obj in data: | ||
recs = _pull_field(obj, path[0]) | ||
if recs == {}: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment around missing key - need to be cognizant of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I understand. Setting |
||
# GH26284 Fill Missing key in this record | ||
# requires all required keys in first record | ||
for key in records[0]: | ||
recs[key] = np.nan | ||
recs = [recs] | ||
|
||
# For repeating the metadata later | ||
lengths.append(len(recs)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an
errors
parameter which this ignores. I think we'll need to be aware of that here in some way, though from my mine comment I don't think we should try and tackle that hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this. The only error we can catch here is for a missing key ? Any other error would happen in the existing baseline ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this assumes that the user always wants to silently ignore missing keys, which is not desirable and makes for a confusing API since we have an "errors" parameter that controls that behavior for the meta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah understood. If we want to give control there are clearly two ways....
(i) redefine
errors = 'ignore'
to cover bothmeta
andrecord path
(ii) introduce another error flag to differentiate between
meta
andrecord path
Is there a convention or a preference in pandas before I implement ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think option one