ENH: Add an option to json_normalize() to protect nested object(s) against flattening #40432

swiss-knight · 2021-03-14T17:32:03Z

Is your feature request related to a problem?

Not really. I simply wish I could use pandas to protect a nested structure / object / dict against flattening when using pd.json_normalize() on a JSON object, for example an API response.

Describe the solution you'd like

Something like this (see below for a complete example):

df = pd.DataFrame(pd.json_normalize(data, protect="foo.bar.baz"))

# or this, if there isn't more than 1 object with the same name 
# (or even if there are more than one object with the same name, it should apply to all of them) :
df = pd.DataFrame(pd.json_normalize(data, protect="baz"))

API breaking implications

I have no detailed idea what this could/would break.

Describe alternatives you've considered

My current workaround is to duplicate the DataFrame; building one without normalizing, the other with. Keeping only the protected column(s) of the one which hasn't been normalized and concatenating them to the other. Remove redundant columns on the final DataFrame.

Additional context

Here's a dummy example:

import pandas as pd

response = '''
    {
      "results": [
        {
          "geometry": {
            "type": "Polygon",
            "crs": 4326,
            "coordinates": 
              [[
                  [6.0, 49.0],
                  [6.0, 40.0],
                  [7.0, 40.0],
                  [7.0, 49.0],
                  [6.0, 49.0]
              ]]
          },
          "attribute": "layer.metadata",
          "bbox": [6, 40, 7, 49],
          "featureName": "Coniferous_Trees",
          "layerName": "State_Forests",
          "type": "Feature",
          "id": "17",
          "properties": {
            "resolution": "100",
            "Year": "2020",
            "label": "Coniferous"
          }
        }
      ]
    }
'''

data = json.loads(response)['results']
df = pd.DataFrame(pd.json_normalize(data))

Then:

>>> print(df.columns)
Index(['attribute', 'bbox', 'featureName', 'layerName', 'type', 'id',
       'geometry.type', 'geometry.crs', 'geometry.coordinates', # <-- the geometry has been flattened along all the other objects
       'properties.resolution', 'properties.Year', 'properties.label'],
      dtype='object')

Desired behaviour:

df = pd.DataFrame(pd.json_normalize(data, protect="results.geometry"))

which would lead to:

>>> print(df.columns)

Index(['attribute', 'bbox', 'featureName', 'layerName', 'type', 'id',
       'geometry', # <-- the geometry element has been protected, it stays as a nested JSON structure in its own column in the DataFrame.
       'properties.resolution',  'properties.Year', 'properties.label'],
      dtype='object')

Thanks for reading.

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2022-06-02T18:35:36Z

Thanks @swiss-knight for the suggestion. This looks like a duplicate of #27241, so closing.

feel free to add suggestions regarding the api to the discussion there.

swiss-knight added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 14, 2021

jbrockmendel added the IO JSON read_json, to_json, json_normalize label Mar 23, 2021

mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2021

simonjayhawkins added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Jun 2, 2022

simonjayhawkins closed this as completed Jun 2, 2022

simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add an option to json_normalize() to protect nested object(s) against flattening #40432

ENH: Add an option to json_normalize() to protect nested object(s) against flattening #40432

swiss-knight commented Mar 14, 2021

simonjayhawkins commented Jun 2, 2022

Uh oh!

Uh oh!

ENH: Add an option to json_normalize() to protect nested object(s) against flattening #40432

ENH: Add an option to json_normalize() to protect nested object(s) against flattening #40432

Comments

swiss-knight commented Mar 14, 2021

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

simonjayhawkins commented Jun 2, 2022

Uh oh!