Skip to content

BUG: NaN level values in stack() and unstack() #9406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seth-p opened this issue Feb 3, 2015 · 5 comments · Fixed by #41389
Closed

BUG: NaN level values in stack() and unstack() #9406

seth-p opened this issue Feb 3, 2015 · 5 comments · Fixed by #41389
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@seth-p
Copy link
Contributor

seth-p commented Feb 3, 2015

As described in #9023 (comment), the way DataFrame.stack() and DataFrame.unstack() treat NaN indices is rather odd/inconsistent. Despite passing test_unstack_nan_index() in test_frame.py, I observe the following (this is from 0.15.2, but I think it's unchanged in the current master for 0.16.0):

In [140]: df = pd.DataFrame(np.arange(4).reshape(2, 2),
                            columns=pd.MultiIndex.from_tuples([('A','a'), ('B', 'b')],
                                                              names=['Upper', 'Lower']),
                            index=Index([0, 1], name='Num'), dtype=np.float64)

In [141]: df_nan = pd.DataFrame(np.arange(4).reshape(2, 2),
                                columns=pd.MultiIndex.from_tuples([('A',np.nan), ('B', 'b')],
                                                                  names=['Upper', 'Lower']),
                                index=Index([0, 1], name='Num'), dtype=np.float64)

In [148]: df
Out[148]:
Upper  A  B
Lower  a  b
Num
0      0  1
1      2  3

In [149]: df.stack()
Out[149]:
Upper       A   B
Num Lower
0   a       0 NaN
    b     NaN   1
1   a       2 NaN
    b     NaN   3

In [150]: df.T.unstack().T
Out[150]:
Upper       A   B
Num Lower
0   a       0 NaN
    b     NaN   1
1   a       2 NaN
    b     NaN   3

In [151]: df_nan
Out[151]:
Upper   A  B
Lower NaN  b
Num
0       0  1
1       2  3

In [152]: df_nan.stack()
Out[152]:
Upper      A  B
Num Lower
0   NaN    0  1
    b      0  1
1   NaN    2  3
    b      2  3

In [153]: df_nan.T.unstack().T
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-153-edbcaeb64f64> in <module>()
----> 1 df_nan.T.unstack().T

C:\Python34\lib\site-packages\pandas\core\frame.py in unstack(self, level)
   3486         """
   3487         from pandas.core.reshape import unstack
-> 3488         return unstack(self, level)
   3489
   3490     #----------------------------------------------------------------------

C:\Python34\lib\site-packages\pandas\core\reshape.py in unstack(obj, level)
    439     if isinstance(obj, DataFrame):
    440         if isinstance(obj.index, MultiIndex):
--> 441             return _unstack_frame(obj, level)
    442         else:
    443             return obj.T.stack(dropna=False)

C:\Python34\lib\site-packages\pandas\core\reshape.py in _unstack_frame(obj, level)
    479     else:
    480         unstacker = _Unstacker(obj.values, obj.index, level=level,
--> 481                                value_columns=obj.columns)
    482         return unstacker.get_result()
    483

C:\Python34\lib\site-packages\pandas\core\reshape.py in __init__(self, values, index, level, value_columns)
    101
    102         self._make_sorted_values_labels()
--> 103         self._make_selectors()
    104
    105     def _make_sorted_values_labels(self):

C:\Python34\lib\site-packages\pandas\core\reshape.py in _make_selectors(self)
    143
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147

ValueError: Index contains duplicate entries, cannot reshape
@seth-p
Copy link
Contributor Author

seth-p commented Feb 3, 2015

cc: @cpcloud, re: #7466

@behzadnouri
Copy link
Contributor

@seth-p see #9061 and #9292

In [13]: df_nan
Out[13]:
Upper   A  B
Lower NaN  b
Num
0       0  1
1       2  3

In [14]: df_nan.T.unstack().T
Out[14]:
Upper       A   B
Num Lower
0   NaN     0 NaN
    b     NaN   1
1   NaN     2 NaN
    b     NaN   3

@seth-p
Copy link
Contributor Author

seth-p commented Feb 3, 2015

Thanks, @behzadnouri. Sorry I missed those.

Unless I'm missing something, the problem with stack() remains.

In [140]: df = pd.DataFrame(np.arange(4).reshape(2, 2),
                            columns=pd.MultiIndex.from_tuples([('A','a'), ('B', 'b')],
                                                              names=['Upper', 'Lower']),
                            index=Index([0, 1], name='Num'), dtype=np.float64)

In [141]: df_nan = pd.DataFrame(np.arange(4).reshape(2, 2),
                                columns=pd.MultiIndex.from_tuples([('A',np.nan), ('B', 'b')],
                                                                  names=['Upper', 'Lower']),
                                index=Index([0, 1], name='Num'), dtype=np.float64)

In [148]: df
Out[148]:
Upper  A  B
Lower  a  b
Num
0      0  1
1      2  3

In [149]: df.stack()
Out[149]:
Upper       A   B
Num Lower
0   a       0 NaN
    b     NaN   1
1   a       2 NaN
    b     NaN   3

In [151]: df_nan
Out[151]:
Upper   A  B
Lower NaN  b
Num
0       0  1
1       2  3

In [152]: df_nan.stack()
Out[152]:
Upper      A  B
Num Lower
0   NaN    0  1
    b      0  1
1   NaN    2  3
    b      2  3

@seth-p seth-p changed the title BUG: NaN indices BUG: NaN level values in stack() and unstack() Feb 4, 2015
@jreback jreback added API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex labels Feb 5, 2015
@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

will mark this as a bug. clearly an edge case, but @behzadnouri fixed this for unstack so....

@jreback jreback added the Bug label Feb 5, 2015
@jreback jreback added this to the Next Major Release milestone Aug 11, 2016
@mroeschke
Copy link
Member

This looks fixed on master. Could use a test

In [24]: df_nan.stack()
Out[24]:
Upper        A    B
Num Lower
0   NaN    0.0  NaN
    b      NaN  1.0
1   NaN    2.0  NaN
    b      NaN  3.0

In [25]: pd.__version__
Out[25]: '1.3.0.dev0+1287.g895f0b4022'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 12, 2021
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.3 May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
5 participants