-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Enhancement include or exclude keys in json normalize #27262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement include or exclude keys in json normalize #27262
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just mulling this over a bit more. Should we be giving consideration to use_keys
instead?
Most other IO methods work by inclusion instead of exclusion (see usecols
parameter) and I'm wondering if that wouldn't be more applicable here as well. Curious to hear your thoughts
Yes I thought about it too. I come from a usecase where I had a large nested What do you think? |
How about |
Few things I thought about. I understand the consistency part but the We can do two things
|
Option2 is wonky from an API perspective. What is the problem with the following though? use_keys = lambda x: x not in {'exclude1', 'exclude2'} I'm not tied to the |
Okay. This makes sense. Let me do this. |
…ignore_keys_in_json_normalize
…ignore_keys_in_json_normalize
…ignore_keys_in_json_normalize
…iravi/pandas into enh/ignore_keys_in_json_normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven’t reviewed in depth yet but general comment on approach
Sorry I didn't mean to actually change the name to Does that make sense? |
Should I change it back to use_key?
…On Wed, Jul 17, 2019, 9:07 PM William Ayd ***@***.***> wrote:
Sorry I didn't mean to actually change the name to usecols but rather to
support the same things, namely a string, list of strings and a callable as
input. This was previously only supporting a callable.
Does that make sense?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#27262?email_source=notifications&email_token=ACNFXIBWRB3ABU4WOHGPJ6TP744EVA5CNFSM4H6TTAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EZTVY#issuecomment-512334295>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACNFXICK7EEDX3Q7544FI3LP744EVANCNFSM4H6TTAKA>
.
|
Sure I think so. There isn’t really the concept of a column in JSON
…Sent from my iPhone
On Jul 17, 2019, at 8:44 AM, Bhavani Ravi ***@***.***> wrote:
Should I change it back to use_key?
On Wed, Jul 17, 2019, 9:07 PM William Ayd ***@***.***> wrote:
> Sorry I didn't mean to actually change the name to usecols but rather to
> support the same things, namely a string, list of strings and a callable as
> input. This was previously only supporting a callable.
>
> Does that make sense?
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#27262?email_source=notifications&email_token=ACNFXIBWRB3ABU4WOHGPJ6TP744EVA5CNFSM4H6TTAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EZTVY#issuecomment-512334295>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ACNFXICK7EEDX3Q7544FI3LP744EVANCNFSM4H6TTAKA>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Will do.
…On Wed, Jul 17, 2019, 9:18 PM William Ayd ***@***.***> wrote:
Sure I think so. There isn’t really the concept of a column in JSON
Sent from my iPhone
> On Jul 17, 2019, at 8:44 AM, Bhavani Ravi ***@***.***>
wrote:
>
> Should I change it back to use_key?
>
> On Wed, Jul 17, 2019, 9:07 PM William Ayd ***@***.***>
wrote:
>
> > Sorry I didn't mean to actually change the name to usecols but rather
to
> > support the same things, namely a string, list of strings and a
callable as
> > input. This was previously only supporting a callable.
> >
> > Does that make sense?
> >
> > —
> > You are receiving this because you modified the open/close state.
> > Reply to this email directly, view it on GitHub
> > <
#27262?email_source=notifications&email_token=ACNFXIBWRB3ABU4WOHGPJ6TP744EVA5CNFSM4H6TTAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EZTVY#issuecomment-512334295
>,
> > or mute the thread
> > <
https://github.com/notifications/unsubscribe-auth/ACNFXICK7EEDX3Q7544FI3LP744EVANCNFSM4H6TTAKA
>
> > .
> >
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#27262?email_source=notifications&email_token=ACNFXIC43LUESLDMYXQ6QJTP745N5A5CNFSM4H6TTAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2E2YNI#issuecomment-512338997>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACNFXID7AGDPDDVMOOIS6QLP745N5ANCNFSM4H6TTAKA>
.
|
…iravi/pandas into enh/ignore_keys_in_json_normalize
@bhavaniravi in general is probably better if you push your changes once they are in state for us to be reviewed. There are few reasons for it:
None of the above is a big deal, feel free to push when you need to. But if there is no reason for pushing more than at the end, it's probably worth that you are aware of the previous things. |
Sure. Will keep that in mind. |
@WillAyd I am kind of stuck at this point when the number of nesting levels is > 1 and when we use
Am I thinking right? |
So to summarize the two options given: [{"a": 1},
{"level_0": {"a": 1}},
{"level_0": {"level_1": {"a": 1}}}] Option 1 assuming Option 2 as I think you mentioned is to require the full hierarchy, so you'd have to do something like Thinking out loud, Option 1 seems easier from a usage perspective whereas Option 2 is more explicit. I might have a slight preference for option 2 as its a more conservative approach and it's usually easier to expand API functionality than to limit it, conceding that I still think Option 1 is more user friendly. Open to your thoughts on it for sure |
@WillAyd I'm leaned towards being explicit about the levels too. The current implementation doesn't support that. Let me fix that and get back |
@WillAyd It's getting way more complicated than I thought and I need your help sorting it out. Currently, we support 3 types of Now to incorporate levels into all these 3 types it is getting way to complicated.
|
Hey @bhavaniravi - I've been on vacation for a few weeks so sorry for lack of response. I'll be back in a few days and can review what you have again in more detail then |
@bhavaniravi just reading through your last comment again. Is your concern that having a user specify the entire path to a key is too rigid? If so agreed it's a little tough from a usage perspective but that goes back to some of the comments in #27262 (comment) So maybe as a third option we could explicitly document that use_key only works on top level keys of the JSON structure for now, and maybe leave it to a future enhancement to have that work further down in the hierarchy. I don't think that requires a lot of extra effort on top of what you have here but is explicit about the limitations of this. Do you think that might be easier? |
@bhavaniravi is this still active? Thoughts on previous comment? |
@@ -207,6 +261,23 @@ def json_normalize( | |||
1 130 60 NaN Mose Reg | |||
2 130 60 2.0 Faye Raker | |||
|
|||
Normalizes nested data up to level 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example with a str / list-of-str? are these useful? better to just make this a callable only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial proposal was to provide a list of keys to ignore. After a discussion with @WillAyd he suggested list and list of str be consistent with the inclusion in other modules which made sense.
With callable only I'm not sure if we can achieve the multi-level support we are talking about iin
#27262 (comment)
Let me know your thoughts
whereas the current implementation only normalizes the I think we are both agreeing on users being explicit for option 1 and 2, leaving the callable for future implementation. is that right? |
If callable is a hangup then feel free to ignore for now, but I am under the impression the issue is more so dealing with nested levels. Can you be sure to add a test case like: data = {
"key1": {
"should_match": 1,
"no_match": 2
},
"should_match": 3
} And making sure that |
…_keys_in_json_normalize
…_keys_in_json_normalize
@WillAyd Made those changes you asked for. I had to change and swap a few things to get it right. Now it allows use_keys to be level specific for eg., |
pandas/io/json/_normalize.py
Outdated
|
||
if is_list_like(use_keys): | ||
return any( | ||
key.split(".")[-len(i.split(".")) :] == i.split(".") for i in use_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you break this up into a few lines instead of just one generator expression? I think would be more readable that way
# current keypath matches the config in use_keys | ||
# only dicts gets recurse-flatten | ||
if ( | ||
is_key_match(newkey, use_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this validation is necessary for the change; can you remove is_key_match
?
@bhavaniravi - could you merge master whenever you have some time |
@bhavaniravi closing as I think this has gone stale but certainly ping if you'd like to pick back up and can reopen |
json_normalize
#27241git diff upstream/master -u -- "*.py" | flake8 --diff