Skip to content

BUG: Solves errors when calling series methods in DataFrame.query with numexpr #43301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 25, 2021
Merged

BUG: Solves errors when calling series methods in DataFrame.query with numexpr #43301

merged 10 commits into from
Sep 25, 2021

Conversation

AlexisMignon
Copy link
Contributor

@AlexisMignon AlexisMignon commented Aug 30, 2021

The initial issue #22435 (check that there is no conflicting names failing) was actually hiding a deeper one leading to a failure when calling numexpr.evaluate().

The commit proposed here solves both issues by adding a temporary variable to the scope for the results of function / method calls when using numexpr engine.

@AlexisMignon
Copy link
Contributor Author

AlexisMignon commented Aug 30, 2021

The condition on numexpr is may be too strong, may be we should do the trick only for unhashable results and when using numexpr ?

@AlexisMignon AlexisMignon changed the title Solves the problem arising when call series methods in query with numexpr [BUG] Solves errors when call series methods in query with numexpr Aug 30, 2021
@AlexisMignon AlexisMignon changed the title [BUG] Solves errors when call series methods in query with numexpr BUG: Solves errors when call series methods in query with numexpr Aug 30, 2021
@AlexisMignon AlexisMignon changed the title BUG: Solves errors when call series methods in query with numexpr BUG: Solves errors when call series methods in DataFrame.query with numexpr Aug 30, 2021
@AlexisMignon AlexisMignon changed the title BUG: Solves errors when call series methods in DataFrame.query with numexpr BUG: Solves errors when calling series methods in DataFrame.query with numexpr Aug 30, 2021
@AlexisMignon AlexisMignon marked this pull request as draft August 30, 2021 13:49
@AlexisMignon AlexisMignon marked this pull request as ready for review August 30, 2021 13:49
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

@pep8speaks
Copy link

pep8speaks commented Aug 31, 2021

Hello @AlexisMignon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-24 08:09:00 UTC

@@ -702,7 +702,11 @@ def visit_Call(self, node, side=None, **kwargs):
if key.arg:
kwargs[key.arg] = self.visit(key.value).value

return self.const_type(res(*new_args, **kwargs), self.env)
if self.engine == "numexpr":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just always do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, This was one of my previous comments. It could depend on the type of the result returned by the function. But since the result has to be embedded in an expression string passed to numexpr.evaluate() it's probably safer to use variable names instead of string representations.

Unless it has an interest for some very specific cases like some literals of base types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if you can try that would be great

Copy link
Contributor Author

@AlexisMignon AlexisMignon Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some tests with numexpr.

For instance this works:

numexpr.evaluate("2 * a + b", {"a": np.array([0.0, 0.0]), "b": [1, 2]})

while this doesn't:

numexpr.evaluate("2 * a + [1, 2]", {"a": np.array([0.0, 0.0])})

So I guess the number of cases where it's a bad idea to use variables instead of literals is quite reduced.
It would basically work only for integer literals (doing it for floats would lead to truncation). Is it worth making such an exception knowing that it will work using variables anyway ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, try to make the change to just fix this (of course if you can additional tests would be great)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure to understand what additional changes you have in mind. As it is, the PR does the job.

As for the additional tests, while I agree that the proposed tests are more functional non regression tests than unit tests, I'm a bit short on the way to make proper unit tests. If you have suggestions, I would be happy to implement them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no what i mean is that i would like to have your patch handle all of the cases and not special case (which i think it works).

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement labels Aug 31, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note, in 1.4. bug fixes, indexing is ok.

@jreback jreback added this to the 1.4 milestone Sep 25, 2021
@jreback jreback merged commit 004a1c9 into pandas-dev:master Sep 25, 2021
@jreback
Copy link
Contributor

jreback commented Sep 25, 2021

very nice @AlexisMignon thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame query method - numexpr safety check fails
4 participants