Skip to content

BUG: Raise a TypeError when record_path doesn't point to an array #33585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2020

Conversation

LTe
Copy link
Contributor

@LTe LTe commented Apr 16, 2020

When record_path points to something that is Iterable but is not
a sequence in JSON world we will receive odd results.

>>> json_normalize([{'key': 'value'}], record_path='key')
0
0  v
1  a
2  l
3  u
4  e

Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be
object, array, number, or string, false, null, true. But only two of them
should be treated as Iterable.

An object is an unordered *collection* of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.

An array is an ordered *sequence* of zero or more values.

--
https://tools.ietf.org/html/rfc8259#page-3

Based on that [{'key': 'value'}] and {'key': 'value'} should not be
treated in the same way. In json_normalize documentation record_path
is described as Path in each object to list of records.

So when we want to translate JSON to Python like an object we need to take
into consideration a list (sequence). Based on that record_path should
point out to list, not Iterable.

In specs I added all possibile values that are allowed in JSON and
should not be treated as a collection. There is a special case for null
value that is already implemented.

type value Iterable Should be treated as list
object {} Yes No (unordered list)
array [] Yes Yes
number 1 No No
string "value" Yes No
false False No No
null Null No No (Check #30148)
true True No No

When `record_path` points to something that is Iterable but is not
a sequence in JSON world we will receive odd results.

```
>>> json_normalize([{'key': 'value'}], record_path='key')
0
0  v
1  a
2  l
3  u
4  e
```

Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be
object, array, number, or string, false, null, true. But only two of them
should be treated as Iterable.

```
An object is an unordered *collection* of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.

An array is an ordered *sequence* of zero or more values.

--
https://tools.ietf.org/html/rfc8259#page-3
```

Based on that `[{'key': 'value'}]` and `{'key': 'value'}` should not be
treated in the same way. In `json_normalize` documentation `record_path`
is described as `Path in each object to list of records`.

So when we want to translate JSON to Python like an object we need to take
into consideration a list (sequence). Based on that `record_path` should
point out to `list`, not `Iterable`.

In specs I added all possibile values that are allowed in JSON and
should not be treated as a collection. There is a special case for null
value that is already implemented.

|  type  |  value  | Iterable | Should be treated as list |
|--------|---------|----------|---------------------------|
| object | {}      | Yes      | No (unordered list)       |
| array  | []      | Yes      | Yes                       |
| number | 1       | No       | No                        |
| string | "value" | Yes      | No                        |
| false  | False   | No       | No                        |
| null   | Null    | No       | No (Check pandas-dev#30148)         |
| true   | True    | No       | No                        |
@jbrockmendel
Copy link
Member

cc @WillAyd

@jbrockmendel jbrockmendel added the IO JSON read_json, to_json, json_normalize label Apr 16, 2020
@jreback jreback added the Bug label Apr 16, 2020
@jreback jreback added this to the 1.1 milestone Apr 16, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @WillAyd

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this seems more correct

@WillAyd WillAyd merged commit cd52502 into pandas-dev:master Apr 16, 2020
@WillAyd
Copy link
Member

WillAyd commented Apr 16, 2020

Thanks @LTe

@LTe LTe deleted the json_normalize_array_path branch April 17, 2020 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json_normalize should raise when record_path doesn't point to an array
4 participants