Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimize z property of point #2032

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Lash-L
Copy link

@Lash-L Lash-L commented Apr 11, 2024

Slightly based off of conversation here: #1995

I noticed that getting .z was incredibly slow compared to get_coordinates.

This refactors .z to be significantly faster.

My test:

start = time.time()
for _ in range(100000):
    p = geo.Point(random.random(), random.random(), random.random())
    z = p.z
    f = p.z
    g = p.z
print(time.time() - start)

edit: Removed old comparisions

Nightly comparision:
Old: 4.941876173019409
New:3.2745301723480225

@mwtoews
Copy link
Member

mwtoews commented Apr 11, 2024

With upcoming development for shapely 2.1 with GEOS 3.12+ the third coordinate (i.e. coords[2]) with XYM geometries is the M coordinate (although not at the moment, as I'm currently still planning get_coodinates(). So this behavior may change soon.

Or, possibly get_coordinates() can be redesigned to have include_z and include_m to return an expected shape (N, 2), (N, 3) or (N, 4), where the coordinates are nan if not present. This can be done as well, as I'm still prototyping enhancements.

@Lash-L
Copy link
Author

Lash-L commented Apr 11, 2024

With upcoming development for shapely 2.1 with GEOS 3.12+ the third coordinate (i.e. coords[2]) with XYM geometries is the M coordinate (although not at the moment, as I'm currently still planning get_coodinates(). So this behavior may change soon.

Interesting. well I think i actually just found an approach that is even faster if you are open to it?

Basically it still uses get_z, but it skips the has_z. as get_z does that for us by returning nan. It's a bit faster. Getting 3.1s

I also realized my numbers for get_z above were incorrect. it is in the 5s, not the 10. Not sure what happened there.

@mwtoews
Copy link
Member

mwtoews commented Apr 11, 2024

Intriguing that has_z is slow, could be something to investigate further.

One surprise that I don't see from the existing tests is that newer GEOS versions allow NaN as valid coordinate values, i.e.:

In [1]: import shapely

In [2]: geom = shapely.from_wkt("POINT Z (1 2 NaN)")

In [3]: geom.has_z
Out[3]: True

In [4]: geom.z
Out[4]: nan

In [5]: shapely.get_coordinates(geom)
Out[5]: array([[1., 2.]])

which means this geometry should continue to allow access to .z (returning nan). Not sure why this failure didn't pop up.

@coveralls
Copy link

coveralls commented Apr 11, 2024

Pull Request Test Coverage Report for Build 8667800059

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 88.126%

Totals Coverage Status
Change from base Build 8388286001: 0.3%
Covered Lines: 2620
Relevant Lines: 2973

💛 - Coveralls

@Lash-L
Copy link
Author

Lash-L commented Apr 12, 2024

I wouldn't say has_z is super slow - but it def has some slowdown and some redundant things are happening if I am understanding things correctly.

This is in the case of getting z when we do have a z coord - I can do this same thing with when we don't have a z coord if oyu're interested.

But in the case that we do have a z coord, half of our time is spent in the has_z function.

Old method: 12791345
New method: 7440167
About a 40% speed up
image

One surprise that I don't see from the existing tests is that newer GEOS versions allow NaN as valid coordinate values,

That seems counter to what I see in the GEOS code(pulled from main iirc)
image

I could be misunderstanding something(still trying to get a grasp of GEOS and my cpp is rusty), but it seems to still check if it is nan.

Although I guess if hasz is true, the nan check doesn't matter.

I'm going to try to learn how to build a newer version of GEOS locally for shapely so I can play with things

@jorisvandenbossche
Copy link
Member

I could be misunderstanding something(still trying to get a grasp of GEOS and my cpp is rusty), but it seems to still check if it is nan.

Although I guess if hasz is true, the nan check doesn't matter.

Yeah, depending on how the Point is created, it "knows" its dimension, and it has stored whether it has a z coord in m_hasz. It's only when that information is not available that it falls back to checking the third coordinate being NaN or not.

That means that with recent GEOS, you can explicitly create a 3D POINTZ (m_hasz will be set) with a NaN as z coord.

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants