-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: constructor from mgr geometry col preservation #3173
REGR: constructor from mgr geometry col preservation #3173
Conversation
@@ -686,14 +686,16 @@ def func(group): | |||
df.groupby("value2").apply(func, **kwargs) | |||
# selecting the non-group columns -> no need to pass the keyword | |||
if ( | |||
compat.PANDAS_GE_21 if geometry_name == "geometry" else compat.PANDAS_GE_20 | |||
) and not compat.PANDAS_GE_22: | |||
compat.PANDAS_GE_22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just swapped the if and else block so I could negate the conditional. Can revert but I find this easier to read as here's the cases where it works, rather than the cases where it's broken.
@@ -1634,7 +1634,17 @@ def _constructor_from_mgr(self, mgr, axes): | |||
return _geodataframe_constructor_with_fallback( | |||
pd.DataFrame._from_mgr(mgr, axes) | |||
) | |||
return GeoDataFrame._from_mgr(mgr, axes) | |||
gdf = GeoDataFrame._from_mgr(mgr, axes) | |||
# _from_mgr doesn't preserve metadata (expect __finalize__ to be called) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we don't end up leaking more of init in here, the nice thing about wrapping everything in _geodataframe_constructor_with_fallback
would be that it's straightforward to maintain, even if there's some overhead - though I'm not saying we actually can do that directly because of the sindex caching.
geopandas/geodataframe.py
Outdated
# still need to mimic __init__ behaviour with geometry=None | ||
if (gdf.columns == "geometry").sum() == 1: # only if "geometry" is single col | ||
try: | ||
gdf["geometry"] = _ensure_geometry(gdf["geometry"].values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this _ensure_geometry
call?
Yes, to preserve the exact behaviour of before (and so maybe best to do that for 0.x), but longer term, we could just set the _geometry_column_name
(in this code path, we already checked that there is at least one column with geometry dtype, we just didn't check it is the one named "geometry")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably uncessary, but I don't know the pandas internals well enough to be sure - I would think if we get a blockmanager it's all or nothing, either all dtypes were conserved or they were all lost (I would expect e.g. apply doesn't come from _from_mgr, but I haven't checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just tested that using
if (gdf.columns == "geometry").sum() == 1:
gdf._geometry_column_name = "geometry"
is enough to fix the example from the issue on pandas 2.1.4. So the try/except may not be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's definitely not needed for this example, it's moreso that by implementing _constructor_from_mgr
there are pandas code paths in pandas which will use this instead of our _constructor
which dispatches to __init__
, and this behaviour is part of __init__
. I will remove it for now then, there's a possibility this could lead to a weird error after some kind of groupby apply type operation, but we can deal with that if it comes up
@m-richards do you remember what the status is here? If I try the original example from the issue, that seems to work fine for geopandas main with both pandas main and pandas 1.5 |
@jorisvandenbossche I believe it's only an issue with pandas 2.0 and 2.1 (I can replicate the original issue now with my pandas 2.1.4 env, and that it's fixed on this branch). The test modified indicates that this scenario is still problematic if the geometry column was not called "geometry" - we effectively hide another bug because Edit: more context, this I think is a bug we introduced by backporting the constructor_from_mgr_sliced etc changes to silence warnings in conjunction with a pandas bug fixed in 2.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@m-richards there is failure in the pandas 2.0 env that seems related |
Closes #3165 (in the specific case of
_geometry_column_name=="geometry"
- which is also the case which previously worked prior to the regression