BUG: duplicate indexing with embedded non-orderables (#17610) #18609

gloryfromca · 2017-12-03T13:04:57Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
closes BUG: duplicate indexing with embedded non-orderables #17610

codecov · 2017-12-03T13:52:45Z

Codecov Report

❗ No coverage uploaded for pull request base (master@c1af9a8). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #18609   +/-   ##
=========================================
  Coverage          ?   91.56%           
=========================================
  Files             ?      153           
  Lines             ?    51276           
  Branches          ?        0           
=========================================
  Hits              ?    46950           
  Misses            ?     4326           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.42% <100%> (?)`
#single	`40.68% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/series.py	`94.82% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1af9a8...ed4dcaa. Read the comment docs.

jreback · 2017-12-03T16:22:15Z

@gloryfromca no need to create a NEW PR every time this needs updating. simply merge master and push to THIS one.

jreback · 2017-12-03T16:22:42Z

doc/source/whatsnew/v0.22.0.txt

@@ -183,6 +183,7 @@ Indexing
 - Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
 - Bug in ``IntervalIndex.symmetric_difference()`` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
 - Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
+- Bug in ``Series`` containing duplicate indexing when gets embedded non-orderables or orderables, raises error or returns unexpected result. (:issue:`17610`)


not very clear what you are fixing here, be a little more consise.

jreback · 2017-12-03T16:23:12Z

pandas/core/series.py

@@ -657,12 +657,12 @@ def __getitem__(self, key):
        try:
            result = self.index.get_value(self, key)

-            if not is_scalar(result):
+            if not is_scalar(result) and key in self.index:


why is this check necessary?

It will cause that some tests fail, if not is_scalar(self.index.get_loc(key)) will raise ValueError when type of key is Int not String. So by using [0] is another recommended way to get the first item with a String Index in Series? I will try to use try...except... to resolve it , Is this way good or a recommended way?
Or I should adjust three failed tests?

test_applymap in pandas.tests.frame.test_apply.TestDataFrameApply.py,

test_match_findall_flags in pandas.tests.test_strings.TestStringMethods.py,

test_constructor_from_items in pandas.tests.frame.test_constructors.TestDataFrameConstructors.py
I'm poor in programing skills and time, sorry about that {:-()}.

jreback · 2017-12-03T16:23:26Z

pandas/tests/series/test_indexing.py

@@ -546,6 +546,22 @@ def test_getitem_setitem_periodindex(self):
        result[4:8] = ts[4:8]
        assert_series_equal(result, ts)

+    def test_getitem_with_duplicates_indices(self):


parameterize this

Sorry, parameterize this , what should I do ?

http://pandas.pydata.org/pandas-docs/stable/contributing.html#using-pytest

jreback · 2017-12-03T16:23:47Z

pandas/tests/series/test_indexing.py

+        s = s.append(pd.Series({1: 313}))
+        s_1 = pd.Series({1: 12, },)
+        s_1 = s_1.append(pd.Series({1: 313}))
+        assert_series_equal(s[1], s_1, check_dtype=False)


don't use check_dtype=False, your expected should account for this

jreback · 2017-12-03T16:23:59Z

pandas/tests/series/test_indexing.py

+        # GH 17610
+        s = pd.Series({1: 12, 2: [1, 2, 2, 3]})
+        s = s.append(pd.Series({1: 313}))
+        s_1 = pd.Series({1: 12, },)


use result and expected to indicate what you are comparing

jreback · 2017-12-09T15:52:35Z

pandas/core/series.py

@@ -666,11 +666,13 @@ def __getitem__(self, key):

                    # we need to box if we have a non-unique index here
                    # otherwise have inline ndarray/lists


can you update this comment a bit, its not longer non-unique, but getting back a scalar loc for the key, IOW its a unique key

Is it unique key?

jreback · 2017-12-09T15:54:30Z

pandas/core/series.py

+                                result, index=[key] * len(result),
+                                dtype=self.dtype).__finalize__(self)
+                    except KeyError:
+                        return result


you can pass on the KeyError (it will fall thru and return result anyhow)

gloryfromca · 2017-12-17T07:15:51Z

@jreback what else can I do for this?

jreback · 2017-12-18T13:54:45Z

doc/source/whatsnew/v0.22.0.txt

@@ -269,6 +269,8 @@ Indexing
 - Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
 - Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
 - Bug in tz-aware :class:`DatetimeIndex` where addition/subtraction with a :class:`TimedeltaIndex` or array with ``dtype='timedelta64[ns]'`` was incorrect (:issue:`17558`)
+- Bug in indexing non_scalar item with unique index in ``Series`` containing duplicate index, returns ``Series`` wrapping value flatted. (:issue:`17610`)


this is so confusing. pls make it simpler.

jreback · 2017-12-18T13:56:17Z

pandas/tests/series/test_indexing.py

+            ],
+        ])
+    def test_getitem_with_duplicates_indices(
+            self, result_1, duplicate_item,


the duplicate_key and unique_key are the same in each parametrization yes? so don't include them, just directly index in the test body

jreback · 2017-12-21T15:29:02Z

thanks @gloryfromca

gloryfromca · 2017-12-21T15:57:35Z

:) 2017-12-21 23:29 GMT+08:00 Jeff Reback <[email protected]>:

…

thanks @gloryfromca <https://github.com/gloryfromca> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18609 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWW15woErYcpptjyjyEG4sLLNPDQFUUbks5tCnlRgaJpZM4QztgW> .

jreback requested changes Dec 3, 2017

View reviewed changes

jreback added Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 3, 2017

gloryfromca force-pushed the master branch from 81261d4 to 499dc68 Compare December 8, 2017 13:18

jreback requested changes Dec 9, 2017

View reviewed changes

gloryfromca force-pushed the master branch from 499dc68 to 5a82391 Compare December 10, 2017 08:02

jreback requested changes Dec 18, 2017

View reviewed changes

BUG: duplicate indexing with embedded non-orderables (pandas-dev#17610)

ed4dcaa

gloryfromca force-pushed the master branch from 5a82391 to ed4dcaa Compare December 20, 2017 12:20

jreback closed this in 7a1b0ee Dec 21, 2017

jreback added this to the 0.23.0 milestone Dec 21, 2017

		@@ -666,11 +666,13 @@ def __getitem__(self, key):

		# we need to box if we have a non-unique index here
		# otherwise have inline ndarray/lists

Uh oh!

BUG: duplicate indexing with embedded non-orderables (#17610) #18609

BUG: duplicate indexing with embedded non-orderables (#17610) #18609

Uh oh!

Conversation

gloryfromca commented Dec 3, 2017 • edited by jreback Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Dec 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gloryfromca commented Dec 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 21, 2017

Uh oh!

gloryfromca commented Dec 21, 2017 via email

Uh oh!

Uh oh!

gloryfromca commented Dec 3, 2017 •

edited by jreback

Loading

codecov bot commented Dec 3, 2017 •

edited

Loading