Skip to content

BUG: Refering to a local variable in a query within a list comprehension errors #53156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
yuvalwas opened this issue May 9, 2023 · 4 comments
Closed
2 of 3 tasks
Labels
Bug Closing Candidate May be closeable, needs more eyeballs expressions pd.eval, query

Comments

@yuvalwas
Copy link

yuvalwas commented May 9, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# does not work:
def test1():
    df = pd.DataFrame([0, 1], columns=['group'])
    g = 1
    print([df.query('group == @g') for _ in range(1)])
test1()

#works:
def test2():
    df = pd.DataFrame([0, 1], columns=['group'])
    g = 1
    print([df.query(f'group == {g}') for _ in range(1)])
test2()

# works:
def test3():
    df = pd.DataFrame([0, 1], columns=['group'])
    g = 1
    for _ in range(1):
        print(df.query(f'group == @g'))
test3()

# works:
df = pd.DataFrame([0, 1], columns=['group'])
g = 1
[df.query('group == @g') for _ in range(1)]

Issue Description

Referring to a local variable in a list comprehension with query's @ does not work, but it does work using f-strings. I tried on Pandas 1.5.3 and 2.0.1.

Expected Behavior

test1 should work as test2.

Installed Versions

INSTALLED VERSIONS

commit : 37ea63d
python : 3.10.11.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-71-generic
Version : #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.1
numpy : 1.24.3
pytz : 2022.7
dateutil : 2.8.2
setuptools : 66.0.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.2
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.2.0
pyqt5 : None
/home/yuval/anaconda3/envs/cltorch/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

@yuvalwas yuvalwas added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 9, 2023
@topper-123
Copy link
Contributor

df.query only accepts local variables inside a query string, see here.

Inside a list comprehension the result from calling locals() is the local scope inside the list comprehension only, not the function. This is a limitation of python itself, so I don't think we can do so much about it in pandas. So you've hit a limitation of python here, AFAIK.

Whether this can be mitigated somehow I don't know, but that would be an enhancement to pandas, not a bug fix.

So I think this will be closed as wontfix, unfortunately. You can check if the docs can be made clearer in this regard.

@topper-123 topper-123 added expressions pd.eval, query Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 14, 2023
@topper-123
Copy link
Contributor

A small cleanup of the above explanation: the dict comprehension tries first the locals(), then globals(). So the below will work and the g variable is 0:

import pandas as pd

g = 0

def test1():
    df = pd.DataFrame([0, 1], columns=['group'])
    g = 1
    list(*(df.query('group == @g') for _ in range(1)))
test1()

This is all according to the python scoping rules, and can't be fixed in pandas.

@topper-123 topper-123 closed this as not planned Won't fix, can't repro, duplicate, stale May 14, 2023
@yuvalwas
Copy link
Author

Thank you for your answers. Why doesn't it apply for f-strings as well (test2 in my example)? Shouldn't it be the same as a result of the list comprehension scoping?

@topper-123
Copy link
Contributor

The variables that stored in 'locals()' in local to the list comprehension, not the function:

def test1():
    g = 1
    print([locals() for _ in range(1)])
test1()

Notice that g isn't stored in the list comprehension, meaning it won't be available for use later...

AFAIK test2 works because list comprehensions are weird: scope when calling locals() and when accessing variables directly are different in list comprehension. I don't have a better explanation, sorry. There's this SO discussion about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs expressions pd.eval, query
Projects
None yet
Development

No branches or pull requests

2 participants