-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Raise a TypeError when record_path doesn't point to an array #33585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When `record_path` points to something that is Iterable but is not a sequence in JSON world we will receive odd results. ``` >>> json_normalize([{'key': 'value'}], record_path='key') 0 0 v 1 a 2 l 3 u 4 e ``` Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be object, array, number, or string, false, null, true. But only two of them should be treated as Iterable. ``` An object is an unordered *collection* of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array. An array is an ordered *sequence* of zero or more values. -- https://tools.ietf.org/html/rfc8259#page-3 ``` Based on that `[{'key': 'value'}]` and `{'key': 'value'}` should not be treated in the same way. In `json_normalize` documentation `record_path` is described as `Path in each object to list of records`. So when we want to translate JSON to Python like an object we need to take into consideration a list (sequence). Based on that `record_path` should point out to `list`, not `Iterable`. In specs I added all possibile values that are allowed in JSON and should not be treated as a collection. There is a special case for null value that is already implemented. | type | value | Iterable | Should be treated as list | |--------|---------|----------|---------------------------| | object | {} | Yes | No (unordered list) | | array | [] | Yes | Yes | | number | 1 | No | No | | string | "value" | Yes | No | | false | False | No | No | | null | Null | No | No (Check pandas-dev#30148) | | true | True | No | No |
cc @WillAyd |
jreback
approved these changes
Apr 16, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @WillAyd
WillAyd
approved these changes
Apr 16, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this seems more correct
Thanks @LTe |
CloseChoice
pushed a commit
to CloseChoice/pandas
that referenced
this pull request
Apr 20, 2020
rhshadrach
pushed a commit
to rhshadrach/pandas
that referenced
this pull request
May 10, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When
record_path
points to something that is Iterable but is nota sequence in JSON world we will receive odd results.
Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be
object, array, number, or string, false, null, true. But only two of them
should be treated as Iterable.
Based on that
[{'key': 'value'}]
and{'key': 'value'}
should not betreated in the same way. In
json_normalize
documentationrecord_path
is described as
Path in each object to list of records
.So when we want to translate JSON to Python like an object we need to take
into consideration a list (sequence). Based on that
record_path
shouldpoint out to
list
, notIterable
.In specs I added all possibile values that are allowed in JSON and
should not be treated as a collection. There is a special case for null
value that is already implemented.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff