You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sadly I am having troubles creating a small reproducible example of this issue, as the problem seems to disappear if I pickle the dataframes and re-load them. I can only see what is happening in a pdb session, and for large MultiIndexed dataframes, which makes them hard to analyze.
Given two dataframes, a and b, with identical indices and different columns, and a unique level for each row called unique_level:
Whenever I concatenate two MultiIndexed dataframes with over 10,000 rows (10 levels in index, 1 level in columns), the dataframes are merged correctly (expected shape), but the columns of the second dataframe are transformed to NaNs.
I noticed that if I slice the dataframes, as shown above (I do not use iloc as they are not ordered), to less or equal than 10,000 rows, this does not happen. That is the result of the right dataframe, while the issue appears again for larger dataframes, like wrong.
I noticed the same issue with the reindex_axis, e.g.:
The indexes of a and b are seemingly identical, although in different order, but I have a suspicion that their underlying structure is somehow different, like for issue #20565, which causes troubles when concatenating/reindexing.
It is also very odd that the issue disappeared if I pickled and un-pickled again the dataframes.
I could not try this with the latest version of pandas as it is the result of calculations for which I need the exact version I am using, but I am open to any suggestions to find the smallest working example that I can try on the latest version.
Thanks!
Expected Output
Same as one for dataframe with less than 10,001 rows
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
Code Sample, a copy-pastable example if possible
Sadly I am having troubles creating a small reproducible example of this issue, as the problem seems to disappear if I pickle the dataframes and re-load them. I can only see what is happening in a pdb session, and for large MultiIndexed dataframes, which makes them hard to analyze.
Given two dataframes,
a
andb
, with identical indices and different columns, and a unique level for each row calledunique_level
:Problem description
Whenever I concatenate two MultiIndexed dataframes with over 10,000 rows (10 levels in index, 1 level in columns), the dataframes are merged correctly (expected shape), but the columns of the second dataframe are transformed to NaNs.
I noticed that if I slice the dataframes, as shown above (I do not use iloc as they are not ordered), to less or equal than 10,000 rows, this does not happen. That is the result of the
right
dataframe, while the issue appears again for larger dataframes, likewrong
.I noticed the same issue with the
reindex_axis
, e.g.:The indexes of
a
andb
are seemingly identical, although in different order, but I have a suspicion that their underlying structure is somehow different, like for issue #20565, which causes troubles when concatenating/reindexing.It is also very odd that the issue disappeared if I pickled and un-pickled again the dataframes.
I could not try this with the latest version of pandas as it is the result of calculations for which I need the exact version I am using, but I am open to any suggestions to find the smallest working example that I can try on the latest version.
Thanks!
Expected Output
Same as one for dataframe with less than 10,001 rows
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 19.1
setuptools: 40.6.3
Cython: 0.28.5
numpy: 1.14.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.1
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0
The text was updated successfully, but these errors were encountered: