Skip to content

Conversation

LTe
Copy link
Owner

@LTe LTe commented Apr 16, 2020

When record_path will points to something that is Iterable but is not
a sequence in JSON world we will receive odd results.

>>> json_normalize([{'key': 'value'}], record_path='key')
0
0  v
1  a
2  l
3  u
4  e

Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be an
object, array, number, or string, false, null, true. But only two of
they should be treated as Iterable.

An object is an unordered *collection* of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.

An array is an ordered *sequence* of zero or more values.

--
https://tools.ietf.org/html/rfc8259#page-3

Based on that [{'key':'value'}] and {'key':'value'} should not be
treated in the same way. In json_normalize documentation record_path
is described as Path in each object to list of records.

So when we want to translate JSON to python like an object we need to take
into consideration list (sequence). Based on that record_path should
point out to list, not Iterable.

In specs I added all possibile values that are allowed in JSON and
should not be treated as collection. There is a special case for null
value that is already implemented.

type value Iterable Should be treated as list
object {} Yes No (unordered list)
array [] Yes Yes
number 1 No No
string "value" Yes No
false False No No
null Null No No (Check pandas-dev#30148)
true True No No

When `record_path` points to something that is Iterable but is not
a sequence in JSON world we will receive odd results.

```
>>> json_normalize([{'key': 'value'}], record_path='key')
0
0  v
1  a
2  l
3  u
4  e
```

Based on RFC 8259 (https://tools.ietf.org/html/rfc8259) a JSON value MUST be
object, array, number, or string, false, null, true. But only two of them
should be treated as Iterable.

```
An object is an unordered *collection* of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.

An array is an ordered *sequence* of zero or more values.

--
https://tools.ietf.org/html/rfc8259#page-3
```

Based on that `[{'key': 'value'}]` and `{'key': 'value'}` should not be
treated in the same way. In `json_normalize` documentation `record_path`
is described as `Path in each object to list of records`.

So when we want to translate JSON to Python like an object we need to take
into consideration a list (sequence). Based on that `record_path` should
point out to `list`, not `Iterable`.

In specs I added all possibile values that are allowed in JSON and
should not be treated as a collection. There is a special case for null
value that is already implemented.

|  type  |  value  | Iterable | Should be treated as list |
|--------|---------|----------|---------------------------|
| object | {}      | Yes      | No (unordered list)       |
| array  | []      | Yes      | Yes                       |
| number | 1       | No       | No                        |
| string | "value" | Yes      | No                        |
| false  | False   | No       | No                        |
| null   | Null    | No       | No (Check pandas-dev#30148)         |
| true   | True    | No       | No                        |
@LTe LTe force-pushed the json_normalize_array_path branch from 7eaa5d7 to 5e70d1f Compare April 16, 2020 07:44
@LTe LTe closed this Apr 16, 2020
@LTe LTe deleted the json_normalize_array_path branch April 17, 2020 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json_normalize should raise when record_path doesn't point to an array
2 participants