-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH add sample #2419 #7274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add sample #2419 #7274
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1241,6 +1241,45 @@ def take(self, indices, axis=0, convert=True, is_copy=True): | |
|
||
return result | ||
|
||
def sample(self, size, replace=True): | ||
"""Take a sample from the object, analogue of numpy.random.choice | ||
|
||
Parameters | ||
---------- | ||
size : int, size of sample to take | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you adhere to the numpy docstring standard?
|
||
replace : bool, default True, whether to sample with replacements | ||
|
||
Returns | ||
------- | ||
type of caller | ||
|
||
Examples | ||
-------- | ||
>>> s = pd.Series([1, 2, 3, 4, 5]) | ||
>>> s.sample(3, replace=False) | ||
2 3 | ||
0 1 | ||
3 4 | ||
dtype: int64 | ||
>>> s.sample(3, replace=True) | ||
1 2 | ||
3 4 | ||
1 2 | ||
dtype: int64 | ||
|
||
Note | ||
---- | ||
If you are sampling without replacement over a larger sample size than | ||
the object you're sampling a ValueError will be raised. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can also add that this is equivalent/related to the |
||
|
||
""" | ||
try: | ||
from numpy.random import choice | ||
except ImportError: | ||
from pandas.stats.misc import choice | ||
msk = choice(len(self), size, replace=replace) | ||
return self.iloc[msk] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should use |
||
|
||
def xs(self, key, axis=0, level=None, copy=None, drop_level=True): | ||
""" | ||
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -297,3 +297,20 @@ def _bucket_labels(series, k): | |
mat[v] = i | ||
|
||
return mat + 1 | ||
|
||
|
||
def choice(arr, size, replace): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. forgot that arr can be an int. Also, I should make the private. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't need this anymore since we require numpy 1.7 now |
||
"""Partial implementation of numpy.random.choice which is new to 1.7 | ||
|
||
Note: unlike numpy's version size must be a scalar. | ||
""" | ||
if replace: | ||
pos = (np.random.sample(size) * len(arr)).astype('int64') | ||
return arr[pos] | ||
else: | ||
if size > len(arr): | ||
raise ValueError("Cannot take a larger sample than " | ||
"population when 'replace=False'") | ||
shuffle = np.arange(len(arr)) | ||
np.random.shuffle(shuffle) | ||
return arr[shuffle[:size]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sample a link?
:func:
~DataFrame.sample``?