BUG: explode() raises ValueError #1223

martinfleis · 2019-11-26T09:33:59Z

Hi,

as reported here https://github.com/martinfleis/momepy/issues/123, at certain situation gdf.explode() raises ValueError: Shape of passed values is (132850, 183), indices imply (132842, 183). Using data retrieved from OSM using OSMnx. (Warning - Vancouver gdf is large)

import geopandas as gpd
import osmnx as ox

gdf = ox.footprints.footprints_from_place(place='Vancouver, Canada')
gdf_projected = ox.project_gdf(gdf)
exploded = gdf_projected.explode()

I tried to save a small set to geojson, but after loading back to geopandas it does not cause the error 🤔

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2019-11-26T10:17:38Z

Smaller reproducer:

Most of the entries don't get exploded, only a few are actual MultiPolygons with multiple parts. Taking the first + one that gets exploded (found from gdf.geometry.explode(), on the GeoSeries it works), still gives the error:

In [26]: subset = gdf.loc[[23253981, 4761998], :] 

In [27]: subset  
Out[27]: 
                                                      nodes                                           geometry     building addr:housenumber  ... check_date bridge opening_date          type
23253981  [251629948, 3607852090, 3607852091, 251629949,...  POLYGON ((-123.0727049 49.2147746, -123.073652...       school              NaN  ...        NaN    NaN          NaN           NaN
4761998                                                 NaN  (POLYGON ((-123.1615685 49.2642942, -123.16157...  residential             2475  ...        NaN    NaN          NaN  multipolygon

[2 rows x 182 columns]

In [28]: subset.explode() 
...
ValueError: Shape of passed values is (3, 183), indices imply (2, 183)

Further taking some columns as well:

In [30]: subset = subset[subset.columns[:5]].copy()

Now, what I noticed when debugging this, is that it is the 2D object block that doesn't get reshaped correctly:

In [32]: subset.explode()   
...
ValueError: Shape of passed values is (3, 6), indices imply (2, 6)

In [33]: %debug
> /home/joris/miniconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py(1718)construction_error()
   1716         raise ValueError("Empty data passed with indices specified.")
   1717     raise ValueError(
-> 1718         "Shape of passed values is {0}, indices imply {1}".format(passed, implied)
   1719     )
   1720 

ipdb> u  
> /home/joris/miniconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py(345)_verify_integrity()
    343         for block in self.blocks:
    344             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 345                 construction_error(tot_items, block.shape[1:], self.axes)
    346         if len(self.items) != tot_items:
    347             raise AssertionError(

ipdb> p self.blocks
(ObjectBlock: slice(0, 4, 1), 4 x 2, dtype: object, IntBlock: slice(4, 5, 1), 1 x 3, dtype: int64, ObjectBlock: slice(5, 6, 1), 1 x 3, dtype: object)
#                                 |-> 2 rows                                      |-> 3 rows                                        |-> 3 rows

And the original dataframe is also all object dtype (the geometry column as well, but that's just because I am debugging on geopandas 0.5 where I had osmnx installed):

In [34]: subset.dtypes                                                                                                                                                                                             
Out[34]: 
nodes               object
geometry            object
building            object
addr:housenumber    object
addr:street         object
dtype: object

So let's see if changing some to non-object dtype solves something, however, that doesn't fix it:

In [38]:  subset['addr:housenumber'] = subset['addr:housenumber'].astype(float) 

In [39]: subset[['addr:housenumber', 'geometry']].explode() 
...
ValueError: Shape of passed values is (3, 3), indices imply (2, 3)

Another specific thing about this dataset is that it has high integer indices (not default 0,1,2, n):

In [42]: subset[['addr:housenumber', 'geometry']].reset_index(drop=True).explode()
Out[42]: 
     addr:housenumber                                           geometry
0 0               NaN  POLYGON ((-123.0727049 49.2147746, -123.073652...
1 0            2475.0  POLYGON ((-123.1615685 49.2642942, -123.161570...
  1            2475.0  POLYGON ((-123.1622072 49.2643049, -123.162209...

That seems to fix it! And it also does fix it on the original data:

In [45]: gdf.reset_index(drop=True).explode()
Out[45]: 
                                                      nodes       building addr:housenumber        addr:street  ... bridge opening_date          type                                           geometry
0      0  [251629948, 3607852090, 3607852091, 251629949,...         school              NaN                NaN  ...    NaN          NaN           NaN  POLYGON ((-123.0727049 49.2147746, -123.073652...
1      0  [268527777, 472917394, 268527778, 3099866715, ...        stadium              777  Pacific Boulevard  ...    NaN          NaN           NaN  POLYGON ((-123.1135167 49.2763119, -123.113285...
2      0  [1845869695, 1845869693, 268527967, 3714369280...        stadium              800      Griffiths Way  ...    NaN          NaN           NaN  POLYGON ((-123.109011 49.278442, -123.1088138 ...
3      0  [366639854, 1578563638, 1578563641, 1578563640...  train_station             1150     Station Street  ...    NaN          NaN           NaN  POLYGON ((-123.0981085 49.2741719, -123.098080...
4      0  [370490167, 5577882816, 5577882808, 5577882809...            yes             1661      Parker Street  ...    NaN          NaN           NaN  POLYGON ((-123.0709845 49.276187, -123.0710625...
...                                                     ...            ...              ...                ...  ...    ...          ...           ...                                                ...
132837 0                                                NaN     commercial              312        Main Street  ...    NaN          NaN  multipolygon  POLYGON ((-123.0994331 49.2817602, -123.099421...
132838 0                                                NaN            yes              NaN                NaN  ...    NaN          NaN  multipolygon  POLYGON ((-123.1289575 49.227361, -123.1287171...
132839 0                                                NaN            yes              NaN                NaN  ...    NaN          NaN  multipolygon  POLYGON ((-123.096785 49.2618756, -123.0967933...
132840 0                                                NaN            yes              966   West 14th Avenue  ...    NaN          NaN  multipolygon  POLYGON ((-123.1258587 49.2585495, -123.125811...
132841 0                                                NaN     apartments             3736  Commercial Street  ...    NaN          NaN  multipolygon  POLYGON ((-123.0679012 49.2515969, -123.067492...

[132850 rows x 182 columns]

So at least, that gives the original reporter a workaround.
And hopefully those pointers can also help us find the cause ;)

martinfleis · 2020-01-04T15:16:39Z

It is related to quite weird behaviour of pd.concat. Following works (simulating behaviour of our explode):

df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george'], ['a', 'b']],
                   columns=['animal', 'name'])
df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])

df4.index = [0, 1, 1]

pd.concat([df1, df4], axis=1)

	letter	number	animal	name
0	a	1	bird	polly
1	b	2	monkey	george
1	b	2	a	b

But if the order of index values is the opposite, it raises ValueError:

df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george'], ['a', 'b']],
                   columns=['animal', 'name'])
df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])

df4.index = [1, 0, 0]

pd.concat([df1, df4], axis=1)

ValueError: Shape of passed values is (3, 4), indices imply (2, 4)

For some reason, index has to be sorted (that is why it works if you do reset_index). Using subset from above:

sort = subset.sort_index()
sort.explode()

               building                                           geometry
4761998  0  residential  POLYGON ((-123.16157 49.26429, -123.16157 49.2...
         1  residential  POLYGON ((-123.16221 49.26430, -123.16221 49.2...
23253981 0       school  POLYGON ((-123.07270 49.21477, -123.07365 49.2...

Explode works as intended. I will fix that by storing order and sorting it in the end.

jorisvandenbossche · 2020-01-24T16:52:07Z

But if the order of index values is the opposite, it raises ValueError:

Hmm, that seems a bug. Can you report that to pandas?

martinfleis · 2020-01-27T22:09:10Z

Will be fixed in pandas-dev/pandas#31113, closing here. Workaround for now is to sort_index() before exploding.

seizethedata · 2020-03-07T13:51:09Z

@martinfleis unfortunately, sort_index() didn't work for me. Still get the error ValueError: Shape of passed values is (3103, 2), indices imply (3089, 2)

martinfleis · 2020-03-07T13:52:51Z

@seizethedata Try reset_index() before exploding.

seizethedata · 2020-03-07T13:55:06Z

@martinfleis I did that too.

My code is:

blocks_saintp_cl = blocks_saintp_clean.reset_index(drop=True) 
blocks = blocks_saintp_cl.sort_index()
sp_blocks = blocks.explode().reset_index(drop=True)

martinfleis · 2020-03-07T13:56:51Z

You have to reset_index before exploding, not after. In case of reseting, you don't need to sort it.

sp_blocks = blocks.reset_index(drop=True).explode()

edit: you changed the code in the meantime. the one above should work, so it looks like a different issue. Can you make minimal reproducible example or share the data by any chance?

seizethedata · 2020-03-07T13:58:24Z

Still the same, unfortunately.

edit: Sorry, was inserting with code brackets wrong.

seizethedata · 2020-03-07T14:01:03Z

@martinfleis I can share the data privately, if that's possible!

martinfleis · 2020-03-07T14:01:57Z

Send them to [email protected].

seizethedata · 2020-03-07T14:05:05Z

I've sent the geojson to you

martinfleis · 2020-03-07T17:26:18Z

@seizethedata This bug is super strange with your data. I wasn't able to figure out what happens there nor find a workaround with current version of geopandas. But I was able to patch explode to work with your data - #1319.

martinfleis · 2020-03-07T18:39:03Z

Update - if you want to check properly working patch use #1251. I am reopening this issue to keep an eye on it as it was supposed to be fixed in pandas but that did not happen yet.

If we'll be close to a release, I'll merge #1251 as temporary patch before pandas will fix it.

seizethedata · 2020-03-10T16:21:18Z

@martinfleis thanks!

Sieboldianus · 2020-08-19T07:21:23Z

I appear to have the same bug after using dissolve on a specific country in geopandas naturalearth_lowres. To reproduce:

# Mollweide projection epsg code
EPSG_CODE = 54009
# note: Mollweide defined by _esri_
# in epsg.io's database
CRS_PROJ = f"esri:{EPSG_CODE}"
CRS_WGS = "epsg:4326"

world = gp.read_file(
    gp.datasets.get_path('naturalearth_lowres'),
    crs=CRS_WGS)
world = world.to_crs(CRS_PROJ)
uk = world[world['name'] == "United Kingdom"]
fr = world[world['name'] == "France"]
# remove polygon from French Guiana
# and join back together as multipolygon
fr = fr.explode().iloc[1:].dissolve(by='name')

# the following works:
uk.explode()
# but not on France:
fr.explode()
# however, explode works with the workaround from martinfleis:
exploded_geom = fr.geometry.explode().reset_index(level=-1)
exploded_index = exploded_geom.columns[0]
fr_exploded = fr.drop(fr._geometry_column_name, axis=1).join(exploded_geom)

jack-tuna · 2021-10-21T17:41:22Z

I have the same issue. A work around for me was to use QGIS to convert vector to single parts and export

martinfleis self-assigned this Jan 4, 2020

martinfleis mentioned this issue Jan 4, 2020

BUG: explode() raises ValueError for unordered index #1251

Closed

martinfleis mentioned this issue Jan 25, 2020

pd.concat inconsistent with non-unique index pandas-dev/pandas#31308

Open

martinfleis closed this as completed Jan 27, 2020

martinfleis mentioned this issue Mar 7, 2020

BUG: patch gdf.explode() concat issue #1319

Closed

martinfleis reopened this Mar 7, 2020

martinfleis mentioned this issue Aug 23, 2020

BUG: Keep geoms within geometry collections after overlay #1582

Merged

martinfleis mentioned this issue May 6, 2021

BUG: exploding gdfs with MultiIndex raises NotImplementedError #1937

Closed

Uh oh!

BUG: explode() raises ValueError #1223

BUG: explode() raises ValueError #1223

Comments

martinfleis commented Nov 26, 2019

jorisvandenbossche commented Nov 26, 2019

Uh oh!

martinfleis commented Jan 4, 2020

Uh oh!

jorisvandenbossche commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinfleis commented Jan 27, 2020

Uh oh!

seizethedata commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinfleis commented Mar 7, 2020

Uh oh!

seizethedata commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinfleis commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seizethedata commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seizethedata commented Mar 7, 2020

Uh oh!

martinfleis commented Mar 7, 2020

Uh oh!

seizethedata commented Mar 7, 2020

Uh oh!

martinfleis commented Mar 7, 2020

Uh oh!

martinfleis commented Mar 7, 2020

Uh oh!

seizethedata commented Mar 10, 2020

Uh oh!

Sieboldianus commented Aug 19, 2020

Uh oh!

jack-tuna commented Oct 21, 2021

Uh oh!

jorisvandenbossche commented Jan 24, 2020 •

edited

Loading

seizethedata commented Mar 7, 2020 •

edited

Loading

seizethedata commented Mar 7, 2020 •

edited

Loading

martinfleis commented Mar 7, 2020 •

edited

Loading

seizethedata commented Mar 7, 2020 •

edited

Loading