Skip to content

Support dicts with default values in series.map #16002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 15, 2017

Conversation

dhimmel
Copy link
Contributor

@dhimmel dhimmel commented Apr 14, 2017

Closes #15999

Still a work in progress.

if isinstance(arg, (dict, Series)):
if isinstance(arg, dict):
arg = self._constructor(arg, index=arg.keys())
default_dict_types = collections.Counter, collections.defaultdict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of actually naming these, use issubclass(type(arg), dict) and not type(arg) is dict

In [17]: f = lambda arg: issubclass(type(arg), dict) and not type(arg) is dict

In [18]: f({})
Out[18]: False

In [19]: f(collections.defaultdict())
Out[19]: True

In [21]: f(collections.Counter())
Out[21]: True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issubclass(type(arg), dict) and not type(arg) is dict

This would evaluate as True for a collections.OrderedDict which should behave like a plain dict.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

always write the tests first!

@codecov
Copy link

codecov bot commented Apr 14, 2017

Codecov Report

Merging #16002 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16002      +/-   ##
==========================================
+ Coverage   91.02%   91.02%   +<.01%     
==========================================
  Files         145      145              
  Lines       50391    50394       +3     
==========================================
+ Hits        45870    45873       +3     
  Misses       4521     4521
Flag Coverage Δ
#multiple 88.83% <100%> (ø) ⬆️
#single 40.33% <40%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 95.09% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d9451d...79cfd11. Read the comment docs.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 14, 2017

always write the tests first!

@jreback good idea (assuming this is so we can check that the tests fail before the enhancement).

I rebased to put the testing commit first. Unfortunately, the GitHub PR displays orders by commit time rather than order. Also CI is only building the tip commit.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

u need to duck type the test rather than hard coding things

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 14, 2017

u need to duck type the test rather than hard coding things

I dug a bit deeper. The __missing__ method is how dict subclasses can return default values. Switched to this method in 961ea46.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

great

make sure to test a dict subclass with and w/o missimg

document in doc string

add a note in other api changes section

@jreback jreback added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 14, 2017
@@ -2132,10 +2132,14 @@ def map_f(values, f):
else:
map_f = lib.map_infer

if isinstance(arg, (dict, Series)):
if isinstance(arg, dict):
if isinstance(arg, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 1-line comment here on what you are doing

@@ -449,10 +449,8 @@ Other Enhancements
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)


- ``Series.map()`` now respects default values of dictionary subclasses with a ``__missing__`` method, such as ``collections.Counter`` (:issue:`15999`, :issue:`16002`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only list the issue (and not the PR)

Copy link
Contributor

@jreback jreback Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an api change (and not an enhancement), ignore how I labelled the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the note to "Other API Changes". Is this okay, or should I make a larger entry under:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

if isinstance(arg, dict):
if hasattr(arg, '__missing__'):
# If a dictionary subclass defines a default value method,
# convert arg to a lookup function (https://git.io/vS7LK).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just add GH #15999 instead of a short url

@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

small comments, other lgtm.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 14, 2017

Confirmed locally that the following tests fail under the old code for Series.map by running py.test --quiet pandas/tests/series/test_apply.py:

  • TestSeriesMap.test_map_counter
  • TestSeriesMap.test_map_defaultdict
  • TestSeriesMap.test_map_dict_subclass_with_missing

Removing WIP from title.

@dhimmel dhimmel changed the title WIP: Support dicts with default values in series.map Support dicts with default values in series.map Apr 14, 2017
@@ -449,10 +449,7 @@ Other Enhancements
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback let me know if you want me to revert the deletion of these blank lines.

@@ -2089,21 +2089,33 @@ def map(self, arg, na_action=None):
two B
three C

Values in Series that are not in the dictionary (as keys) are converted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in Notes section.

defines ``__missing__`` (i.e. provides a method for default values),
then this default is used rather than ``NaN``:

>>> from collections import Counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this example at the end

assert_series_equal(result, expected)

def test_map_dict_subclass_with_missing(self):
class DictWithMissing(dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number here as a comment

@jreback jreback added this to the 0.20.0 milestone Apr 14, 2017
@jreback
Copy link
Contributor

jreback commented Apr 14, 2017

couple minor doc comments. ping on green.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 15, 2017

>>> "It's \N{green apple}"
"It's 🍏"

@jreback also known as PR is passing all checks

@jreback jreback merged commit 61d84db into pandas-dev:master Apr 15, 2017
@jreback
Copy link
Contributor

jreback commented Apr 15, 2017

thanks!

@dhimmel dhimmel deleted the map-default-dict branch April 15, 2017 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants