Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_coordinates(point) is 4x faster than (point.x, point.y) #1995

Open
Zaczero opened this issue Feb 21, 2024 · 5 comments
Open

get_coordinates(point) is 4x faster than (point.x, point.y) #1995

Zaczero opened this issue Feb 21, 2024 · 5 comments

Comments

@Zaczero
Copy link

Zaczero commented Feb 21, 2024

At this moment, this is not a bug report nor a feature request. I primarily want to highlight the fact that get_coordinates(point) is approximately 4x faster than using point.x, point.y.

I believe this matters substantially as the point.x, point.y pattern is very popular for retrieving Python float numbers for a given point. It is also quite often present in hot loops.

A search on GitHub for 'shapely ("point.x, point.y" OR "p.x, p.y")' returns 1.5k code results, many of which are inside of for loops.

Benchmark

Python 3.12.1
shapely 2.0.3
python -m timeit -s "from shapely import Point" -s "a = Point(12,34)" "a.x, a.y"
# 20000 loops, best of 5: 11.3 usec per loop
python -m timeit -s "from shapely import Point, get_coordinates" -s "a = Point(12,34)" "get_coordinates(a)[0]"
# 100000 loops, best of 5: 2.98 usec per loop
@Lash-L
Copy link

Lash-L commented Apr 2, 2024

Yeah this has been on my mind for a while. We use shapely a lot for work, and when we do our optimization passes, we always go through and just do x = point.x and then use x whenever we need the value. I wasn't aware of get_coordinates()

I do struggle to understand why a calculation needs to be done at all for x and y and why they aren't just stored. Perhaps i'm missing something basic, but operations aren't done in place that I can think of, so I don't fully get why you would need a calculation - but I also haven't dove into the GEOS library to see.

It definitely should at the very least be a note in the docs as it can be a significant slowdown to codebases.

@martinfleis
Copy link
Contributor

Given what shapely is doing here is calling respective GEOS functions here, the question boils down to why GEOSGeomGetX_r, GEOSGeomGetY_r is slower than GEOSGeom_getCoordSeq_r within GEOS itself.

Also note that the comparison above is not precise as doing a.x, a.y is calling get_x first and then get_y, while get_coordinates does only a single pass. I see two actionable items:

  1. Follow up with GEOS in https://github.com/libgeos/geos and ask the same question there.
  2. Use get_coordinates internally in Point.x and Point.y. At the same time, we should try to steer users towards get_coordinates or get_x when they need coordinates and do the loop as the performance boost of the vectorized operation will be much higher than the internal refactoring.

@jorisvandenbossche
Copy link
Member

FWIW, it is mostly the ufunc machinery that adds a (relatively speaking) significant overhead when called for a single geometry object (and get_coordinates is not a ufunc, so I suppose it has less of this overhead).

See #1021 for a general issue about the ufunc overhead and performance of scalar methods/attributes

I quickly tried what it would take to specifically optimize the Point coord access attributes, see #2035, which gets the .x down to ~200ns.
While it might be worth doing that for .x et al, because this is often used for code that works with scalar objects, we also don't want to encourage this kind of code too much, as often there are even more performant options available when rewriting your code to fully leverage the new vectorized capabilities of shapely >= 2.

@Lash-L
Copy link

Lash-L commented Apr 12, 2024

Wow - that's quite the speed up.

I don't fully understand how to use the ufunc functionality yet (I plan to learn it) but to me that seems like a very good improvement. Does that change make it so that it is no longer possible to use ufuncs for .x?

One thing I did notice while trying to understand the codebase (I'm still a far ways out) is that getX() in Point.cpp in geos checks isEmpty() {
throw error
}
get coordinate

However, getCoordinate() also checks isEmpty

I tried to setup a custom version of GEOS to test it but I'm doing something wrong and my python instance ends up not being able to find shapely.lib, so I need to try some more.

But I would think that removing one of those isEmpty checks should also give a nice speed up

Edit: wow just experimented with using ufunc and it's a game changer. I learned about STRTree a few months ago and that made a huge time difference for me, this is going to be another one. Thank you and all of the other maintainers for your great work on this package

@jorisvandenbossche
Copy link
Member

Does that change make it so that it is no longer possible to use ufuncs for .x?

No, the change is separate from the ufuncs, it just adds a specialized implementation for just the .x attribute that doesn't need to handle an array of geometries (which is what the ufuncs are good at)

One thing I did notice while trying to understand the codebase (I'm still a far ways out) is that getX() in Point.cpp in geos checks isEmpty() {

My guess is that this isEmpty check is very cheap onthe GEOS side and won't make that much of a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants