-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DataFrame query method - numexpr safety check fails #22435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd like to add another example to reinforce the above message. I came across this when using Pandas on a remote machine with Using the
Both of the
I ran the above example locally without I'm using Pandas 1.0.3, Python 3.8, numexpr 2.7.1.
|
I tested the example in my original post, and the one by the commenter above and both seem to work w/ v1.1. Going to close--thanks for all the work put into pandas! |
@machow i'll reopen this as this does not appear to be fixed in 1.1.0 or master. did you have numexpr installed? |
also we should generally add tests to prevent regression before closing issues. |
Ah, thanks @simonjayhawkins -- two years elapsed from the time this was opened, and I didn't see a response from any pandas devs, so assumed it may have gone stale (I likely don't have time to submit a PR for this anymore, but am happy to test). edit: thanks for the pointer--after installing numexpr the error reappears. Any feedback on the original suggestion?
|
Hello, Anything new about this issue ? The following code: import pandas as pd
df = pd.DataFrame([[0.0, 0.0], [0.0, 0.0]], columns=["A", "B"])
df.query("A.isnull()") Crashes with the message:
only when numexpr is installed with pandas==1.3.2 and numexpr==2.7.3 |
The following solution was proposed by @machow :
It is unfortunately not enough. I've replaced the code starting at pandas/pandas/core/computation/expr.py Line 827 in c979bd8
@property
def names(self):
"""
Get the names in an expression.
"""
if is_term(self.terms):
return frozenset([self.terms.name])
return frozenset(term.name for term in com.flatten(self.terms)) by @property
def names(self):
"""
Get the names in an expression.
"""
if is_term(self.terms):
if self.terms.name.__hash__ is not None:
return frozenset([self.terms.name])
else:
return frozenset()
return frozenset(term.name for term in com.flatten(self.terms)) which is probably not the best way. It however allowed me to go further in the execution. At some point: pandas/pandas/core/computation/engines.py Line 121 in c979bd8
a string is passed to
It seems that the result of the expression parsing with |
If anyone is having trouble with Example: orders.query("item_name.str.contains('Chicken')", engine="python")
You can also use the old-style masking instead. orders[orders.item_name.str.contains('Chicken')] |
If anyone is having trouble with
|
Uh oh!
There was an error while loading. Please reload this page.
Code Sample, a copy-pastable example if possible
raises
TypeError: unhashable type: 'numpy.ndarray'
Problem description
Background
When using numexpr, Pandas has an internal function,
_check_ne_builtin_clash
, for detecting when a variable used in a method like query clashes with a numexpr built-in.Here's an example of the function raising an error as intended..
Mostly, the names it protects again are math functions like
sin
,cos
,sum
, etc..Why my original example fails
The trouble with my original code is that
check_ne_builtin_clash
is checking the name of both sides of the BinaryExpr AST node corresponding to"a.astype('int') < 2"
.It does this by putting them into a frozenset.
However, the LHS ends up being a Constant node, with the name
array([1,2,3])
, which is an ndarray, so is not hashable.Solution
It seems like the helper function
_check_ne_builtin_clash
should consider any name that is unhashable safe, since it can't conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!code for function:
pandas/pandas/core/computation/engines.py
Lines 23 to 38 in b822535
code for var names it looks for:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.2.1
pip: 9.0.1
setuptools: 40.0.0
Cython: 0.24
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.4.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 4.2.2
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: