-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: get_flattened_iterator #35515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 17 commits into
pandas-dev:master
from
mroeschke:cln/get_flattened_iterator
Aug 6, 2020
Merged
CLN: get_flattened_iterator #35515
Changes from 10 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2f5c450
CLN: Simplify get_flattened_iterator
071378d
Yield correct value
b3af159
isort
1c581e6
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
1cb2fed
Rename to get_flattened_list
dd8263c
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
a09af1b
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
52b938c
store intermediate arrays for performance
775ea23
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
f889efd
typing
8e83f5a
Add better typing
0ce1136
type levels
4584c6d
Change noqa code
68273ab
Add another noqa
7793b9d
use iterable instead of list
58ddb7e
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
eb2be1b
Merge remote-tracking branch 'upstream/master' into cln/get_flattened…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
""" miscellaneous sorting / groupby utilities """ | ||
from typing import Callable, Optional | ||
from collections import defaultdict | ||
from typing import Callable, Dict, List, Optional, Tuple | ||
|
||
import numpy as np | ||
|
||
|
@@ -440,36 +441,18 @@ def ensure_key_mapped(values, key: Optional[Callable], levels=None): | |
return result | ||
|
||
|
||
class _KeyMapper: | ||
""" | ||
Map compressed group id -> key tuple. | ||
""" | ||
|
||
def __init__(self, comp_ids, ngroups: int, levels, labels): | ||
self.levels = levels | ||
self.labels = labels | ||
self.comp_ids = comp_ids.astype(np.int64) | ||
|
||
self.k = len(labels) | ||
self.tables = [hashtable.Int64HashTable(ngroups) for _ in range(self.k)] | ||
|
||
self._populate_tables() | ||
|
||
def _populate_tables(self): | ||
for labs, table in zip(self.labels, self.tables): | ||
table.map(self.comp_ids, labs.astype(np.int64)) | ||
|
||
def get_key(self, comp_id): | ||
return tuple( | ||
level[table.get_item(comp_id)] | ||
for table, level in zip(self.tables, self.levels) | ||
) | ||
|
||
|
||
def get_flattened_iterator(comp_ids, ngroups, levels, labels): | ||
# provide "flattened" iterator for multi-group setting | ||
mapper = _KeyMapper(comp_ids, ngroups, levels, labels) | ||
return [mapper.get_key(i) for i in range(ngroups)] | ||
def get_flattened_list( | ||
comp_ids: np.ndarray, ngroups: int, levels, labels: List[np.ndarray] | ||
) -> List[Tuple]: | ||
"""Map compressed group id -> key tuple.""" | ||
comp_ids = comp_ids.astype(np.int64, copy=False) | ||
arrays: Dict[int, List] = defaultdict(list) | ||
for labs, level in zip(labels, levels): | ||
table = hashtable.Int64HashTable(ngroups) | ||
table.map(comp_ids, labs.astype(np.int64, copy=False)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could make this a list-comprehension, maybe it would be slightly less readable though |
||
for i in range(ngroups): | ||
arrays[i].append(level[table.get_item(i)]) | ||
return [tuple(array) for array in arrays.values()] | ||
|
||
|
||
def get_indexer_dict(label_list, keys): | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists in the typing module. Can the
List
beList[int]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#30539 (comment)