Skip to content

ENH: allow dataframe get to take an axis argument #11550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 44 additions & 16 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1062,23 +1062,51 @@ def _indexer(self):
# add to our internal names set
cls._internal_names_set.add(iname)

def get(self, key, default=None):
def get(self, key, default=None, axis=None):
"""
Get item from object for given key (DataFrame column, Panel slice,
etc.). Returns default value if not found
etc.) along the given axis. Returns default value if not found

Parameters
----------
key : object
default : object, optional
Value to return if key is not present
axis : int or None
The axis to filter on. By default this is the info axis. The "info
axis" is the axis that is used when indexing with ``[]``. For
example, ``df = DataFrame({'a': [1, 2, 3, 4]]})

Returns
-------
value : type of items contained in object
value : type of items contained in object, or default
"""
# special case (GH 5652)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, this just generates a ValueError which by definition returns default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That generates a TypeError ("cannot use label indexing with a null key"). But you're right, I can also catch it below...

if key is None:
return default
# general case
if axis is None:
axis = self._info_axis_number
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more idiomatic is

if axis is None:
    axis = self._info_axis_number
axis = self._get_axis_number(axis)

axis = self._get_axis_number(axis)
slices = [slice(None)] * self.ndim
slices[axis] = key
try:
return self[key]
return self.loc[tuple(slices)]
except (KeyError, ValueError, IndexError):
return default
pass
# Two possibilities:
# 1) the key is not present, and we should return default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way to convoluted. show an example of this case.

# 2) self.loc does not like our slice (which we have to deal with the
# axis parameter). This happens for instance with a series with a
# string index and a negative key.
# To cover this last case, we revert to the previous implementation:
if axis == self._info_axis_number:
try:
return self[key]
except (KeyError, ValueError, IndexError):
pass
return default

def __getitem__(self, item):
return self._get_item_cache(item)
Expand Down Expand Up @@ -1779,7 +1807,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
avoid duplicating data
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* default: don't fill gaps
* pad / ffill: propagate last valid observation forward to next valid
Expand Down Expand Up @@ -1822,7 +1850,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,

Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned ``NaN``.
records in the dataframe are assigned ``NaN``.

>>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
Expand All @@ -1836,8 +1864,8 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,

We can fill in the missing values by passing a value to
the keyword ``fill_value``. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
``method`` to fill the ``NaN`` values.
increasing or decreasing, we cannot use arguments to the keyword
``method`` to fill the ``NaN`` values.

>>> df.reindex(new_index, fill_value=0)
http_status response_time
Expand All @@ -1855,8 +1883,8 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
IE10 404 0.08
Chrome 200 0.02

To further illustrate the filling functionality in
``reindex``, we will create a dataframe with a
To further illustrate the filling functionality in
``reindex``, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).

Expand All @@ -1873,7 +1901,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
2010-01-06 88

Suppose we decide to expand the dataframe to cover a wider
date range.
date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
Expand All @@ -1890,10 +1918,10 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
2010-01-07 NaN

The index entries that did not have a value in the original data frame
(for example, '2009-12-29') are by default filled with ``NaN``.
(for example, '2009-12-29') are by default filled with ``NaN``.
If desired, we can fill in the missing values using one of several
options.
options.

For example, to backpropagate the last valid value to fill the ``NaN``
values, pass ``bfill`` as an argument to the ``method`` keyword.

Expand All @@ -1911,7 +1939,7 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
2010-01-07 NaN

Please note that the ``NaN`` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the ``NaN`` values present
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,14 @@ def test_rename(self):

# multiple axes at once

def test_get(self):
# GH 6703
# testing the axis parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need generic testing on Series,DataFrame,Panel for all axes.

df = DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3], 'c': [1, 2, 3]})
x = df.set_index(['a', 'b'])
assert_series_equal(x.get((1, 1), axis=0), x.T.get((1, 1)))
assert_series_equal(x.get('c', axis=1), x.get('c'))

def test_get_numeric_data(self):

n = 4
Expand Down