REGR: constructor from mgr geometry col preservation #3173

m-richards · 2024-02-05T10:20:58Z

Closes #3165 (in the specific case of _geometry_column_name=="geometry" - which is also the case which previously worked prior to the regression

m-richards · 2024-02-05T10:23:17Z

geopandas/tests/test_pandas_methods.py

@@ -686,14 +686,16 @@ def func(group):
    df.groupby("value2").apply(func, **kwargs)
    # selecting the non-group columns -> no need to pass the keyword
    if (
-        compat.PANDAS_GE_21 if geometry_name == "geometry" else compat.PANDAS_GE_20
-    ) and not compat.PANDAS_GE_22:
+        compat.PANDAS_GE_22


I just swapped the if and else block so I could negate the conditional. Can revert but I find this easier to read as here's the cases where it works, rather than the cases where it's broken.

m-richards · 2024-02-05T10:24:38Z

geopandas/geodataframe.py

@@ -1634,7 +1634,17 @@ def _constructor_from_mgr(self, mgr, axes):
            return _geodataframe_constructor_with_fallback(
                pd.DataFrame._from_mgr(mgr, axes)
            )
-        return GeoDataFrame._from_mgr(mgr, axes)
+        gdf = GeoDataFrame._from_mgr(mgr, axes)
+        # _from_mgr doesn't preserve metadata (expect __finalize__ to be called)


I hope we don't end up leaking more of init in here, the nice thing about wrapping everything in _geodataframe_constructor_with_fallback would be that it's straightforward to maintain, even if there's some overhead - though I'm not saying we actually can do that directly because of the sindex caching.

jorisvandenbossche · 2024-02-06T09:30:49Z

geopandas/geodataframe.py

+        # still need to mimic __init__ behaviour with geometry=None
+        if (gdf.columns == "geometry").sum() == 1:  # only if "geometry" is single col
+            try:
+                gdf["geometry"] = _ensure_geometry(gdf["geometry"].values)


Do we need this _ensure_geometry call?
Yes, to preserve the exact behaviour of before (and so maybe best to do that for 0.x), but longer term, we could just set the _geometry_column_name (in this code path, we already checked that there is at least one column with geometry dtype, we just didn't check it is the one named "geometry")

I think it's probably uncessary, but I don't know the pandas internals well enough to be sure - I would think if we get a blockmanager it's all or nothing, either all dtypes were conserved or they were all lost (I would expect e.g. apply doesn't come from _from_mgr, but I haven't checked)

I've just tested that using

if (gdf.columns == "geometry").sum() == 1: gdf._geometry_column_name = "geometry"

is enough to fix the example from the issue on pandas 2.1.4. So the try/except may not be needed here.

Yeah it's definitely not needed for this example, it's moreso that by implementing _constructor_from_mgr there are pandas code paths in pandas which will use this instead of our _constructor which dispatches to __init__, and this behaviour is part of __init__. I will remove it for now then, there's a possibility this could lead to a weird error after some kind of groupby apply type operation, but we can deal with that if it comes up

jorisvandenbossche · 2024-05-19T13:23:03Z

@m-richards do you remember what the status is here?

If I try the original example from the issue, that seems to work fine for geopandas main with both pandas main and pandas 1.5

m-richards · 2024-05-21T12:09:38Z

@m-richards do you remember what the status is here?

If I try the original example from the issue, that seems to work fine for geopandas main with both pandas main and pandas 1.5

@jorisvandenbossche I believe it's only an issue with pandas 2.0 and 2.1 (I can replicate the original issue now with my pandas 2.1.4 env, and that it's fixed on this branch). The test modified indicates that this scenario is still problematic if the geometry column was not called "geometry" - we effectively hide another bug because _constructor_from_mgr will auto attempt to try "geometry", but from memory that's a mess to solve because it would mean depending on "self" in defining _constructor_from_mgr and there's a pandas issue to convert that instance method into a classmethod I think.

Edit: more context, this I think is a bug we introduced by backporting the constructor_from_mgr_sliced etc changes to silence warnings in conjunction with a pandas bug fixed in 2.3

martinfleis

Thanks!

jorisvandenbossche · 2024-06-07T15:43:05Z

@m-richards there is failure in the pandas 2.0 env that seems related

geopandas/tests/test_pandas_methods.py

more regression fixing

1a90a50

m-richards commented Feb 5, 2024

View reviewed changes

multiple columns called geometry fix

20b5ce4

jorisvandenbossche reviewed Feb 6, 2024

View reviewed changes

martinfleis added this to the 1.0 milestone Apr 28, 2024

remove try except

421144a

martinfleis approved these changes Jun 1, 2024

View reviewed changes

jorisvandenbossche reviewed Jun 7, 2024

View reviewed changes

geopandas/tests/test_pandas_methods.py Outdated Show resolved Hide resolved

Update geopandas/tests/test_pandas_methods.py

f47d1ea

jorisvandenbossche merged commit 1e6225d into geopandas:main Jun 7, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: constructor from mgr geometry col preservation #3173

REGR: constructor from mgr geometry col preservation #3173

m-richards commented Feb 5, 2024

m-richards Feb 5, 2024

m-richards Feb 5, 2024

jorisvandenbossche Feb 6, 2024

m-richards Feb 6, 2024

martinfleis May 31, 2024

m-richards Jun 1, 2024

jorisvandenbossche commented May 19, 2024

m-richards commented May 21, 2024 •

edited

martinfleis left a comment

jorisvandenbossche commented Jun 7, 2024

REGR: constructor from mgr geometry col preservation #3173

REGR: constructor from mgr geometry col preservation #3173

Conversation

m-richards commented Feb 5, 2024

m-richards Feb 5, 2024

Choose a reason for hiding this comment

m-richards Feb 5, 2024

Choose a reason for hiding this comment

jorisvandenbossche Feb 6, 2024

Choose a reason for hiding this comment

m-richards Feb 6, 2024

Choose a reason for hiding this comment

martinfleis May 31, 2024

Choose a reason for hiding this comment

m-richards Jun 1, 2024

Choose a reason for hiding this comment

jorisvandenbossche commented May 19, 2024

m-richards commented May 21, 2024 • edited

martinfleis left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 7, 2024

m-richards commented May 21, 2024 •

edited