Skip to content

Don't drop unreduced variables #5393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 12, 2021
Merged

Don't drop unreduced variables #5393

merged 4 commits into from
Jun 12, 2021

Conversation

malmans2
Copy link
Contributor

Reduce methods such as mean drop non-numeric variables. However, there's no need to drop variables that are not actually reduced (i.e., when "reduce_dim" not in da.dims).

cc: @rcaneill

@max-sixty
Copy link
Collaborator

Great! Thanks @malmans2 !

@max-sixty max-sixty added the plan to merge Final call for comments label May 28, 2021
@max-sixty
Copy link
Collaborator

I'll leave this open for a day or so, as this is the sort of area where I'm going to miss some intricacy that someone else is familiar with.

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @malmans2, looks good to me (it took me some time to really figure out how that condition works, though).

@@ -4986,7 +4986,8 @@ def reduce(
variables[name] = var
else:
if (
not numeric_only
not reduce_dims
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, wouldn't it make more sense to add unreduced variables without calling reduce? Something like

if not reduce_dims:
    variables[name] = var
elif not numeric_only or np.issubdtype(var.dtype, nb.number) or var.dtype == np.bool_:
    # ...
    variables[name] = var.reduce(...)

Copy link
Contributor Author

@malmans2 malmans2 May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first idea, but a few test failed, so it would be a breaking change. I think it mainly has to do with how attributes are handled by reduce.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, attrs seems to be one of the issues (but one we could fix!). However, some reduction functions (e.g. std, var) need to run on variables that don't have the reduce dims, so my suggestion is actually wrong. See also the concept of invariant_0d introduced in #5207.

As I'm sure I will have forgotten about this in a few months: could you add a comment explaining this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last question: if I'm understanding this correctly, not numeric_only and all following lines will be overridden by not reduce_dims. Should we move that three lines down, below the dtype checks?

@max-sixty
Copy link
Collaborator

@malmans2 would you like to add a whatsnew?

@max-sixty
Copy link
Collaborator

@malmans2 thank you and please feel free to add a whatsnew in another PR!

@max-sixty max-sixty merged commit e17cf59 into pydata:master Jun 12, 2021
@malmans2 malmans2 deleted the fix_reduce branch June 21, 2021 08:53
@malmans2 malmans2 restored the fix_reduce branch June 21, 2021 08:53
@malmans2 malmans2 deleted the fix_reduce branch June 21, 2021 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ds.mean('dim') drops strings dataarrays, even when the 'dim' is not dimension of the string dataarray
3 participants