Skip to content

Conversation

@jbrockmendel
Copy link
Member

any, all, and std go through GroupBy._get_cythonized_result instead of the more-standard WrappedCythonOp. I'm trying to refactor any/all to use the other path, and as a step toward that am trying to make group_any_all follow the same patterns as the other functions in libgroupby.

The implementation here looks to me like it should behave identically to the existing implementation, but it fails a bunch of tests (mostly in tests/groupby/test_any_all.py). Hoping @phofl can explain where I'm going wrong.

# so by Kleene logic the result is currently unknown
if out[lab, j] != flag_val:
out[lab, j] = -1
result_mask[lab, j] = 1
Copy link
Member

@phofl phofl Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any: If NA is encountered as first value in the group you are setting the mask to 1 here but you don't reset it if you find another value in the group that is not NA. You'll have to update the result_mask if you find another value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK. I missed that we are checking out[lab, j] != ... here as opposed to values[i, j] != .... Thanks.

pp_kwargs["result_mask"] = result_mask

result = post_processing(result, inferences, **pp_kwargs)
result = post_processing(result, inferences, result_mask=result_mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably have to remove the flag is_nullable for the std post_processing function (the default is False) and check for result_mask is not None

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this, this was on my todo list as well

@jbrockmendel
Copy link
Member Author

Your suggestion helped, green

@phofl phofl added the Groupby label Mar 17, 2023
@phofl phofl added this to the 2.1 milestone Mar 17, 2023
@phofl phofl merged commit 1f7a7f2 into pandas-dev:main Mar 17, 2023
@phofl
Copy link
Member

phofl commented Mar 17, 2023

thx @jbrockmendel

@jbrockmendel jbrockmendel deleted the ref-group_any_all branch March 17, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants