Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: geometry column not found after groupby & column select #3165

Open
2 tasks done
snowman2 opened this issue Jan 31, 2024 · 4 comments · May be fixed by #3173
Open
2 tasks done

REGR: geometry column not found after groupby & column select #3165

snowman2 opened this issue Jan 31, 2024 · 4 comments · May be fixed by #3173
Labels
Milestone

Comments

@snowman2
Copy link
Contributor

snowman2 commented Jan 31, 2024

Related:

Code Sample, a copy-pastable example

import geopandas
gdf = geopandas.read_file(
    "https://raw.githubusercontent.com/corteva/geocube/master/test/test_data/input/soil_data_group.geojson"
)

for _, df_group in gdf.groupby("hzdept_r")[["sandtotal_r", "geometry"]]:
    df_group.geometry.name

Problem description

  • FAIL: geopandas 0.14.3 and pandas<2.2
  • PASS: geopandas 0.14.3 and pandas>=2.2
  • PASS: geopandas 0.14.2 and pandas<2.2 or pandas>=2.2
  File "debug.py", line 8, in <module>
    df_group.geometry.name
    ^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/geo/lib/python3.11/site-packages/pandas/core/generic.py", line 6204, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mambaforge/envs/geo/lib/python3.11/site-packages/geopandas/geodataframe.py", line 236, in _get_geometry
    raise AttributeError(msg)
AttributeError: You are calling a geospatial method on the GeoDataFrame, but the active geometry column to use has not been set. 
There are columns with geometry data type (['geometry']), and you can either set one as the active geometry with df.set_geometry("name") or access the column as a GeoSeries (df["name"]) and call the method directly on it.

Expected Output

Output of geopandas.show_versions()

SYSTEM INFO

python : 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
executable : ~/mambaforge/envs/geo/bin/python
machine : Linux-6.5.0-15-generic-x86_64-with-glibc2.35

GEOS, GDAL, PROJ INFO

GEOS : 3.12.1
GEOS lib : None
GDAL : 3.8.2
GDAL data dir: ~/mambaforge/envs/geo/share/gdal
PROJ : 9.3.1
PROJ data dir: ~/mambaforge/envs/geo/share/proj

PYTHON DEPENDENCIES

geopandas : 0.14.3
numpy : 1.26.2
pandas : 2.1.4
pyproj : 3.6.1
shapely : 2.0.2
fiona : 1.9.5
geoalchemy2: 0.14.3
geopy : None
matplotlib : 3.8.2
mapclassify: 2.6.1
pygeos : None
pyogrio : None
psycopg2 : 2.9.9 (dt dec pq3 ext lo64)
pyarrow : None
rtree : 1.1.0

@m-richards
Copy link
Member

Thanks for the report @snowman2! Looks like it's from #3080 as well. This keeps on giving.

@m-richards
Copy link
Member

m-richards commented Feb 1, 2024

Ah, there's a reason I have deja vu, closely related to or the same as #2966 (comment) I think, and the fix in pandas 2.3 is pandas-dev/pandas#56761 - I guessed we missed this as an implication of backporting this to silence errors.

The return GeoDataFrame._from_mgr(mgr, axes) line in _constructor_from_mgr doesn't preserve metadata natively, so we're reliant on a follow up call to __finalize__ (and previously, this wasn't an issue because the geometry column was called geometry, and _geodataframe_constructor_with_fallback caught that case.

Perhaps we should have

    def _constructor_from_mgr(self, mgr, axes):
        # replicate _geodataframe_constructor_with_fallback behaviour
        return _geodataframe_constructor_with_fallback(
                pd.DataFrame._from_mgr(mgr, axes)
            )

and try and improve the sindex caching directly.

@jorisvandenbossche
Copy link
Member

Perhaps we should have

Other alternative could be to mimic what the constructor was doing in the else block that currently just does return GeoDataFrame._from_mgr(mgr, axes).

So before, this mgr was passed to GeoDataFrame(..) instead. And my understanding is that then:

  • the mgr was passed to super().__init__(), thus setting up the DataFrame (that is now handled by our inherited _from_mgr, I think)
  • if there was a column named "geometry", we would try to ensure this was a geometry dtype column, and when this succeeds, set the geometry name to "geometry"

That's it, I think. So IIUC, we just set the active geometry column name to "geometry" if that column name was present and contained geometries.

That's something we could also mimic here to avoid the overhead of the full constructor?

@m-richards
Copy link
Member

That's it, I think. So IIUC, we just set the active geometry column name to "geometry" if that column name was present and contained geometries.

I think that's right.

@jorisvandenbossche jorisvandenbossche added this to the 0.14.x milestone Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants