New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow MultiPoint initialization #1954
Comments
The non-vectorised check does indeed look like it could be improved (if only by iterating over points, not over a range). shapely/shapely/geometry/multipoint.py Line 53 in 9c852a6
But users can choose multipoints or MultiPoint, if the speedup is important to them. As to the why, MultiPoint is typed for any sequence, whereas multipoints expects an array like argument. It (multipoints) is also wrapped in a multithreading decorator, and calls out to create_collection which is implemented in a C extension. |
Thanks for the explanation @JamesParrott! As you say, users can choose the faster variant, as I did. I guess there is no easy vectorization fix if the sequence can be heterogeneous. Maybe it would help to put a pointer to this option in the documentation or docstring? Feel free to close the issue if desired, I opened it only because I was so surprised by the almost 2000x performance difference compared to LineString initialization. |
The docs talk about ufuncs at the start. But even so, if multipoints is a ufunc, yes indeed the docs could be much clearer. I would add something at the start of the Geometry / API section here: https://shapely.readthedocs.io/en/stable/geometry.html Otherwise, Shapely already forces users to install Numpy. The Shapely docs should prioritise the parts of the API that make the most of Numpy. |
Thanks for the issue! It seems this can be improved indeed. While you have the option to use
It's true that this is still slower than
Yes, that's a good point. That page could definitely mention the vectorized functions to create geometries. And in addition, also each docstring of Point/LineString/Polygon/.. etc could mention the equivalent ufunc. |
-> #1961 with the patch |
Expected behavior and actual behavior.
I expect that initializing a large MultiPoint geometry is roughly as expensive as initializing a LineString from the same coordinates. Instead, it is ~2000x slower.
Steps to reproduce the problem.
(using ipython syntax for statement timing):
Most of that extra time is in the non-vectorized check for empty geometries, which can be bypassed:
This is still a factor of ~50x slower. I haven't investigated why. But vectorizing that empty-geometry check seems like low-hanging fruit for a ~50x speedup.
Operating system
Ubuntu
Shapely version and provenance
Shapely v2.0.1 from conda-forge. I did check the latest main, it seems like the non-vectorized error check is still present.
The text was updated successfully, but these errors were encountered: