Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use geo accessor for GeoSeries methods? #166

Open
shoyer opened this issue Sep 19, 2014 · 13 comments · May be fixed by #3272
Open

Use geo accessor for GeoSeries methods? #166

shoyer opened this issue Sep 19, 2014 · 13 comments · May be fixed by #3272

Comments

@shoyer
Copy link

shoyer commented Sep 19, 2014

The next version of pandas will introduce a .dt series accessor for datetime methods, along .cat for categoricals and the existing .str for string methods.

From this perspective, would it make sense to have geo-specific methods use a .geo attribute? e.g., s.geo.convex_hull instead of the current s.convex_hull.

Just thought I'd throw that out there for consideration, especially if tighter integration with pandas itself is desired at some point (e.g., to plug into the pandas internals at a lower level such that GeoSeries and GeoDataFrame are not necessary).

@jankatins
Copy link

Not sure if that's the intended direction of this discussion, but Categorical can now be inserted into a dataframe (and is used as a backend) like a numpy ndarray, so I think it is possible to convert GeoSeries into a GeoArray (and probably also add a GeoIndex) and use pd.DataFrame directly (which probably needs a merge, as this would need some changes on the pandas side -> see the Categorical works in https://github.com/pydata/pandas/pulls?q=is%3Apr+label%3ACategorical+is%3Aclosed).

@jwass
Copy link
Member

jwass commented Sep 19, 2014

This is something I'm really starting to support... especially after dealing with all the typing issues that seem to keep popping up and feel unavoidable.

We actually discussed and made a basic implementation of this back in #69, but ultimately decided against it. I think now that .cat and .dt are there, a .geo is only natural.

The only immediate questions that come to mind are how to store the extra information that needs to travel along with GeoSeries or a would-be GeoArray... particularly, the crs and rtree. Maybe this mechanism already exists, but not sure now... I'll have to take a deeper look at how Categorical and dt are implemented. The crs could be solved with something like Shapely #132, but not sure how to move the rtree around appropriately...

Hoping to hear from the other GeoPandans about this too.

@jankatins
Copy link

Look at Categorical: If you want to stuff some object into a pandas.DataFrame, it needs to support the methods, which the blockmanager calls on this object (see pandas/core/categorical.py) and a Spoecial Blocl (see pandas/core/common.py). And some special casing in the rest of pandas, which you get from using it :-) Have a look at pandas-dev/pandas#7217, which implemented most of it (there are two more fixup, which found places where pandas didn't handle categorical data)

@jankatins
Copy link

s.dt is "only" an accessor for a DateTimeIndex(self) and s.cat is similar, the "categoricals as a block" is in the above PR...

@shoyer
Copy link
Author

shoyer commented Sep 19, 2014

@JanSchulz This was indeed my intended direction for this discussion. Categorical is a very nice precedent, though I'm not sure if pandas is ready to support the necessary public API.

@shoyer
Copy link
Author

shoyer commented Sep 19, 2014

@jwass Yes, I think the right approach would be to store the crs and rtree as GeoArray attributes. This would be sort of similar to how pandas handles datetime arrays, which have an associated timezone.

One other thing -- when/if this does happen, it would be great if it can work for ndarrays, not just 1d arrays (like categorical currently). I can imagine multi-dimensional geoarrays working great for raster data... perhaps even some sort of mash up with xray (my project for multi-dimensional pandas-like data structures).

@jankatins
Copy link

@shoyer In Categorical, the public API is contained to the accessor (any Categorical itself). I think the more problematic change is that the rest of the codebase assumes in a few places that it gets a numpy array and this would need special casing (like with Categorical).

@kjordahl
Copy link
Member

I'm coming around on the geo accessor, especially seeing how it can be consistent with other pandas usage. I'm less sure about the other attributes, we'll have to think on that a bit.

What I'd like to do as a next step is get a stable 0.2 release with spatial indexing and spatial joins, then we can look at doing some design for the next version.

@shoyer
Copy link
Author

shoyer commented Sep 21, 2014

@JanSchulz The real problem, of course, is that there is no standard way to write numpy ndarray-like objects that cannot or should not be actual ndarrays (e.g., because it's overkill to write C). Something like an abstract base class for ndarray like objects. I feel that this something more fundamental than pandas, but I don't even know where it belongs (numpy? blaze?).

@jankatins
Copy link

Categorical is a subclass ofPandasObject, but that is not a subclass np.ndarray, so no C or Cython. There was some talk about implementing Categorical as a np.dtype.

@shoyer
Copy link
Author

shoyer commented Sep 21, 2014

@JanSchulz Yep. I guess what I'm saying is, I wish it was easier to write custom dtypes, such as categorical and geo data without the C.

@shoyer
Copy link
Author

shoyer commented Sep 21, 2014

The discussion about dtypes seems to be getting a little off-topic here, so I posted to the numpy mailing list: http://mail.scipy.org/pipermail/numpy-discussion/2014-September/071231.html

@tswast
Copy link

tswast commented Apr 13, 2023

Looks like a duplicate of #680 (comment) (which has a more in-depth discussion)

@tswast tswast linked a pull request Apr 27, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants