Skip to content

BUG: duplicate indexing with embedded non-orderables (#17610) #18609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gloryfromca
Copy link
Contributor

@gloryfromca gloryfromca commented Dec 3, 2017

@codecov
Copy link

codecov bot commented Dec 3, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@c1af9a8). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18609   +/-   ##
=========================================
  Coverage          ?   91.56%           
=========================================
  Files             ?      153           
  Lines             ?    51276           
  Branches          ?        0           
=========================================
  Hits              ?    46950           
  Misses            ?     4326           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.42% <100%> (?)
#single 40.68% <0%> (?)
Impacted Files Coverage Δ
pandas/core/series.py 94.82% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1af9a8...ed4dcaa. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2017

@gloryfromca no need to create a NEW PR every time this needs updating. simply merge master and push to THIS one.

@@ -183,6 +183,7 @@ Indexing
- Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
- Bug in ``IntervalIndex.symmetric_difference()`` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
- Bug in ``Series`` containing duplicate indexing when gets embedded non-orderables or orderables, raises error or returns unexpected result. (:issue:`17610`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very clear what you are fixing here, be a little more consise.

@@ -657,12 +657,12 @@ def __getitem__(self, key):
try:
result = self.index.get_value(self, key)

if not is_scalar(result):
if not is_scalar(result) and key in self.index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this check necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause that some tests fail, if not is_scalar(self.index.get_loc(key)) will raise ValueError when type of key is Int not String. So by using [0] is another recommended way to get the first item with a String Index in Series? I will try to use try...except... to resolve it , Is this way good or a recommended way?
Or I should adjust three failed tests?

  • test_applymap in pandas.tests.frame.test_apply.TestDataFrameApply.py,
  • test_match_findall_flags in pandas.tests.test_strings.TestStringMethods.py,
  • test_constructor_from_items in pandas.tests.frame.test_constructors.TestDataFrameConstructors.py
    I'm poor in programing skills and time, sorry about that {:-()}.

@@ -546,6 +546,22 @@ def test_getitem_setitem_periodindex(self):
result[4:8] = ts[4:8]
assert_series_equal(result, ts)

def test_getitem_with_duplicates_indices(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, parameterize this , what should I do ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s = s.append(pd.Series({1: 313}))
s_1 = pd.Series({1: 12, },)
s_1 = s_1.append(pd.Series({1: 313}))
assert_series_equal(s[1], s_1, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use check_dtype=False, your expected should account for this

# GH 17610
s = pd.Series({1: 12, 2: [1, 2, 2, 3]})
s = s.append(pd.Series({1: 313}))
s_1 = pd.Series({1: 12, },)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result and expected to indicate what you are comparing

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 3, 2017
@@ -666,11 +666,13 @@ def __getitem__(self, key):

# we need to box if we have a non-unique index here
# otherwise have inline ndarray/lists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this comment a bit, its not longer non-unique, but getting back a scalar loc for the key, IOW its a unique key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it unique key?

result, index=[key] * len(result),
dtype=self.dtype).__finalize__(self)
except KeyError:
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass on the KeyError (it will fall thru and return result anyhow)

@gloryfromca
Copy link
Contributor Author

@jreback what else can I do for this?

@@ -269,6 +269,8 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in indexing a datetimelike ``Index`` that raised ``ValueError`` instead of ``IndexError`` (:issue:`18386`).
- Bug in tz-aware :class:`DatetimeIndex` where addition/subtraction with a :class:`TimedeltaIndex` or array with ``dtype='timedelta64[ns]'`` was incorrect (:issue:`17558`)
- Bug in indexing non_scalar item with unique index in ``Series`` containing duplicate index, returns ``Series`` wrapping value flatted. (:issue:`17610`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so confusing. pls make it simpler.

],
])
def test_getitem_with_duplicates_indices(
self, result_1, duplicate_item,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the duplicate_key and unique_key are the same in each parametrization yes? so don't include them, just directly index in the test body

@jreback jreback closed this in 7a1b0ee Dec 21, 2017
@jreback jreback added this to the 0.23.0 milestone Dec 21, 2017
@jreback
Copy link
Contributor

jreback commented Dec 21, 2017

thanks @gloryfromca

@gloryfromca
Copy link
Contributor Author

gloryfromca commented Dec 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: duplicate indexing with embedded non-orderables
2 participants