Skip to content

Python: Enhance completions of DataFrames with column information #601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
petetronic opened this issue May 19, 2023 · 11 comments
Closed

Python: Enhance completions of DataFrames with column information #601

petetronic opened this issue May 19, 2023 · 11 comments
Assignees
Milestone

Comments

@petetronic
Copy link
Collaborator

petetronic commented May 19, 2023

Positron uses Jedi for its Python LSP. We want to enhance completion suggestions provided by the LSP to include more information about the objects in the user's environment. This requires that we connect the information from the user's session in the ipython kernel with the LSP.

One set of enhancements we want to make are that completions for DataFrame objects include the dynamic column names of the data.

@petetronic petetronic added this to the Internal Preview milestone May 19, 2023
@petetronic
Copy link
Collaborator Author

Google document with more detail on this feature.

@seeM seeM self-assigned this May 23, 2023
@seeM
Copy link
Contributor

seeM commented May 25, 2023

Some findings so far:

Pandas DataFrame columns are already accessible as attributes, for example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"mycolumn": [7]})

In [3]: df
Out[3]:
   mycolumn
0         7

So I tried to see whether completing df.myc would complete to df.mycolumn in Positron (with the above dataframe in the namespace). It seems to work in the Console but not in the Source Editor.

I'm now digging into why that's the case.

@seeM
Copy link
Contributor

seeM commented May 25, 2023

I've narrowed it down with the script below. It seems that if Jedi is able to statically analyse the code snippet it receives, it prefers that to any existing variables in the passed namespaces. Might be a bug or feature request to configure this (I haven't yet double-checked if it is configurable). Seems related to davidhalter/jedi#1857 (comment) although not quite the same.

import pandas as pd
from jedi.api import Interpreter

df = pd.DataFrame({'mycolumn': [7]})
namespaces = [{'df': df}]

# These completions include the `mycolumn` attribute.
code = 'df.'
interpreter = Interpreter(code, namespaces)
completions = interpreter.complete(1, 3)
print([completion.complete for completion in completions if completion.complete == 'mycolumn'])
# ['mycolumn']

# These completions don't include the `mycolumn` attribute.
# The only difference is that `code` contains the import and definition of `df`, which seems to
# make Jedi prefer static analysis and ignore the namespace
code = '''
import pandas as pd
df = pd.DataFrame({'mycolumn': [7]})
df.
'''.strip()
interpreter = Interpreter(code, namespaces)
completions = interpreter.complete(3, 3)
print([completion.complete for completion in completions if completion.complete == 'mycolumn'])
# []

@seeM seeM added the blocked label Jun 7, 2023
@seeM
Copy link
Contributor

seeM commented Jun 7, 2023

Marking this as "blocked" while we wait for a response on our issue in the Jedi repo. If that doesn't happen soon I'll look into a workaround.

@seeM
Copy link
Contributor

seeM commented Jun 29, 2023

The Jedi maintainers' stance is that this case is too specific for them to handle. Some thoughts on solutions when this is picked up again:

  • An ideal solution would identify the object being completed, and if it exists in the namespace, use dir(object) to find completions.
    • I'm not sure how tricky it is to identify the object being completed. Jedi must be doing this under the hood, though.
  • A quicker, less ideal workaround is to do two completions: (1) the original one, and (2) where code only contains the single line being completed – then merge and deduplicate their results.
  • We may also want to look into IPython's completion system, which wraps Jedi, but also handles magic commands.

@seeM seeM removed blocked labels Jun 29, 2023
@seeM seeM removed their assignment Aug 9, 2023
@seeM seeM self-assigned this Dec 11, 2023
@seeM
Copy link
Contributor

seeM commented Dec 14, 2023

This is ready for review. You should now be able to complete the columns of pandas and polars dataframes via dict access (e.g. df[') and via attribute access (e.g. df.).

Secondly, you should also see a preview of a dataframe/series in the completion documentation pop-up window.

Here's a snippet to create pandas and polars dataframes:

import numpy as np, pandas as pd, polars as pl
data = np.random.randn(100, 3)
pd_df = pd.DataFrame({k: v for k, v in zip('abcdefghijklmnopqrstuvwxyz', data)})
pl_df = pl.DataFrame({k: v for k, v in zip('abcdefghijklmnopqrstuvwxyz', data)})

@petetronic
Copy link
Collaborator Author

petetronic commented Dec 17, 2023

I think there's a regression in Jedi after merging latest upstream for completions, will chat with you on what I see in the output channel this coming week.

@seeM
Copy link
Contributor

seeM commented Dec 18, 2023

I don't think this is related to our changes. I'm not 100% sure but I suspect it's related to upstream changes around managing the PYTHONPATH.

Here is the corresponding upstream issue: microsoft/vscode-python#22659.

Here are the logs:

[Python] [pygls.protocol] ERROR | Failed to handle request 1 textDocument/documentSymbol DocumentSymbolParams(text_document=TextDocumentIdentifier(uri='file:///Users/seem/posit/demos/completions.py'), work_done_token=None, partial_result_token=None)
[Python] Traceback (most recent call last):
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/pygls/protocol.py", line 400, in _handle_request
[Python]     self._execute_request(msg_id, handler, params)
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/pygls/protocol.py", line 322, in _execute_request
[Python]     self._send_response(msg_id, handler(params))
[Python]                                 ^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/positron/positron_jedilsp.py", line 557, in positron_document_symbol
[Python]     return document_symbol(server, params)
[Python]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi_language_server/server.py", line 435, in document_symbol
[Python]     jedi_script = jedi_utils.script(server.project, document)
[Python]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi_language_server/jedi_utils.py", line 118, in script
[Python]     return Script(code=document.source, path=document.path, project=project)
[Python]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi/api/__init__.py", line 119, in __init__
[Python]     self._inference_state = InferenceState(
[Python]                             ^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi/inference/__init__.py", line 87, in __init__
[Python]     environment = project.get_environment()
[Python]                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi/api/project.py", line 245, in get_environment
[Python]     self._environment = create_environment(self._environment_path, safe=False)
[Python]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi/api/environment.py", line 367, in create_environment
[Python]     return Environment(_get_executable_path(path, safe=safe), env_vars=env_vars)
[Python]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Python]   File "/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/pythonFiles/lib/jedilsp/jedi/api/environment.py", line 380, in _get_executable_path
[Python]     raise InvalidPythonEnvironment("%s seems to be missing." % python)
[Python] jedi.api.environment.InvalidPythonEnvironment: bin/python seems to be missing.

@seeM
Copy link
Contributor

seeM commented Dec 19, 2023

This is ready for review. I believe the error mentioned above is unrelated - this still works for me in the latest build.

@juliasilge
Copy link
Contributor

In Positron 2023.12.0 (Universal) build 1644 I can get column completions via dict access (df[') for for polars and pandas, and completions via attributes (df.) for pandas:

columns.mov

I haven't worked a ton with polars but looks like "columns as attributes" is not a thing, right? I believe so, so I am closing as complete. ✅

@seeM
Copy link
Contributor

seeM commented Dec 22, 2023

@juliasilge yep, polars doesn't do "columns as attributes"

wesm pushed a commit that referenced this issue Mar 28, 2024
Merge pull request #265 from posit-dev/prefer-namespace-completions

Prefer namespace completions
--------------------
Commit message for posit-dev/positron-python@8359792:

test that namespace completions are preferred

--------------------
Commit message for posit-dev/positron-python@1d3855a:

prefer completions using the user's namespace over static analysis

Relates to #601.


Authored-by: Wasim Lorgat <[email protected]>
Signed-off-by: Wasim Lorgat <[email protected]>
wesm pushed a commit that referenced this issue Mar 28, 2024
Merge pull request #265 from posit-dev/prefer-namespace-completions

Prefer namespace completions
--------------------
Commit message for posit-dev/positron-python@8359792:

test that namespace completions are preferred

--------------------
Commit message for posit-dev/positron-python@1d3855a:

prefer completions using the user's namespace over static analysis

Relates to #601.


Authored-by: Wasim Lorgat <[email protected]>
Signed-off-by: Wasim Lorgat <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants