Skip to content

Improve lookup documentation #61471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stevenae
Copy link

Follows from #61185

Examples available at https://colab.research.google.com/drive/1MGWX6JVJL5yHyK7BeEBPQAW4tLM3TZL9#scrollTo=DjWfk4i1SiOY

stevenae added 2 commits May 21, 2025 10:46
Add pd_lookup_het() and pd_lookup_hom()
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! No strong opposition to having both functions, but the performance gain of the _het version does not seem significant to me.

idx, cols = pd.factorize(df['col'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
def pd_lookup_hom(df, row_labels, col_labels):
rows = df.index.get_indexer(row_labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add df = df.loc[:, sorted(set(col_labels))] here.

result = values[flat_index]
return result

For homogeneous column types, it is fastest to skip column subsetting and go directly to numpy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: NumPy


.. ipython:: python
For heterogeneous column types, we subset columns to avoid unnecessary numpy conversions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy again.

@rhshadrach rhshadrach added the Indexing Related to indexing on series/frames, not to indexes themselves label May 21, 2025
@rhshadrach rhshadrach added this to the 3.0 milestone May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: re-implement DataFrame.lookup.
3 participants