REGR: geometry column not found after groupby & column select #3165

snowman2 · 2024-01-31T21:59:32Z

BUG: pandas 2.2.0 changes GeoDataFrame.from_records() behavior #3152
REGR: directly fix _constructor_from_mgr regression #3159 / COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080 (maybe, maybe not ...)
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of geopandas.

Code Sample, a copy-pastable example

import geopandas
gdf = geopandas.read_file(
    "https://raw.githubusercontent.com/corteva/geocube/master/test/test_data/input/soil_data_group.geojson"
)

for _, df_group in gdf.groupby("hzdept_r")[["sandtotal_r", "geometry"]]:
    df_group.geometry.name

Problem description

FAIL: geopandas 0.14.3 and pandas<2.2
PASS: geopandas 0.14.3 and pandas>=2.2
PASS: geopandas 0.14.2 and pandas<2.2 or pandas>=2.2

  File "debug.py", line 8, in <module>
    df_group.geometry.name
    ^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/geo/lib/python3.11/site-packages/pandas/core/generic.py", line 6204, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/geo/lib/python3.11/site-packages/geopandas/geodataframe.py", line 236, in _get_geometry
    raise AttributeError(msg)
AttributeError: You are calling a geospatial method on the GeoDataFrame, but the active geometry column to use has not been set. 
There are columns with geometry data type (['geometry']), and you can either set one as the active geometry with df.set_geometry("name") or access the column as a GeoSeries (df["name"]) and call the method directly on it.

Expected Output

Output of `geopandas.show_versions()`

SYSTEM INFO

python : 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
executable : ~/mambaforge/envs/geo/bin/python
machine : Linux-6.5.0-15-generic-x86_64-with-glibc2.35

GEOS, GDAL, PROJ INFO

GEOS : 3.12.1
GEOS lib : None
GDAL : 3.8.2
GDAL data dir: ~/mambaforge/envs/geo/share/gdal
PROJ : 9.3.1
PROJ data dir: ~/mambaforge/envs/geo/share/proj

PYTHON DEPENDENCIES

geopandas : 0.14.3
numpy : 1.26.2
pandas : 2.1.4
pyproj : 3.6.1
shapely : 2.0.2
fiona : 1.9.5
geoalchemy2: 0.14.3
geopy : None
matplotlib : 3.8.2
mapclassify: 2.6.1
pygeos : None
pyogrio : None
psycopg2 : 2.9.9 (dt dec pq3 ext lo64)
pyarrow : None
rtree : 1.1.0

The text was updated successfully, but these errors were encountered:

m-richards · 2024-02-01T10:37:09Z

Thanks for the report @snowman2! Looks like it's from #3080 as well. This keeps on giving.

m-richards · 2024-02-01T10:50:46Z

Ah, there's a reason I have deja vu, closely related to or the same as #2966 (comment) I think, and the fix in pandas 2.3 is pandas-dev/pandas#56761 - I guessed we missed this as an implication of backporting this to silence errors.

The return GeoDataFrame._from_mgr(mgr, axes) line in _constructor_from_mgr doesn't preserve metadata natively, so we're reliant on a follow up call to __finalize__ (and previously, this wasn't an issue because the geometry column was called geometry, and _geodataframe_constructor_with_fallback caught that case.

Perhaps we should have

    def _constructor_from_mgr(self, mgr, axes):
        # replicate _geodataframe_constructor_with_fallback behaviour
        return _geodataframe_constructor_with_fallback(
                pd.DataFrame._from_mgr(mgr, axes)
            )

and try and improve the sindex caching directly.

jorisvandenbossche · 2024-02-04T15:40:42Z

Perhaps we should have

Other alternative could be to mimic what the constructor was doing in the else block that currently just does return GeoDataFrame._from_mgr(mgr, axes).

So before, this mgr was passed to GeoDataFrame(..) instead. And my understanding is that then:

the mgr was passed to super().__init__(), thus setting up the DataFrame (that is now handled by our inherited _from_mgr, I think)
if there was a column named "geometry", we would try to ensure this was a geometry dtype column, and when this succeeds, set the geometry name to "geometry"

That's it, I think. So IIUC, we just set the active geometry column name to "geometry" if that column name was present and contained geometries.

That's something we could also mimic here to avoid the overhead of the full constructor?

m-richards · 2024-02-04T21:26:37Z

That's it, I think. So IIUC, we just set the active geometry column name to "geometry" if that column name was present and contained geometries.

I think that's right.

snowman2 added bug needs triage labels Jan 31, 2024

m-richards removed the needs triage label Feb 1, 2024

m-richards linked a pull request Feb 5, 2024 that will close this issue

REGR: constructor from mgr geometry col preservation #3173

Open

jorisvandenbossche added this to the 0.14.x milestone Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: geometry column not found after groupby & column select #3165

REGR: geometry column not found after groupby & column select #3165

snowman2 commented Jan 31, 2024 •

edited

SYSTEM INFO

GEOS, GDAL, PROJ INFO

PYTHON DEPENDENCIES

m-richards commented Feb 1, 2024

m-richards commented Feb 1, 2024 •

edited

jorisvandenbossche commented Feb 4, 2024

m-richards commented Feb 4, 2024

REGR: geometry column not found after groupby & column select #3165

REGR: geometry column not found after groupby & column select #3165

Comments

snowman2 commented Jan 31, 2024 • edited

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of geopandas.show_versions()

SYSTEM INFO

GEOS, GDAL, PROJ INFO

PYTHON DEPENDENCIES

m-richards commented Feb 1, 2024

m-richards commented Feb 1, 2024 • edited

jorisvandenbossche commented Feb 4, 2024

m-richards commented Feb 4, 2024

snowman2 commented Jan 31, 2024 •

edited

Output of `geopandas.show_versions()`

m-richards commented Feb 1, 2024 •

edited