Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython and memviews creation #17299

Open
NicolasHug opened this issue May 21, 2020 · 5 comments
Open

Cython and memviews creation #17299

NicolasHug opened this issue May 21, 2020 · 5 comments

Comments

@NicolasHug
Copy link
Member

NicolasHug commented May 21, 2020

Not an issue, just a Cython-related PSA that we need to keep in mind when reviewing PRs:

We shouldn't create 1d views for each sample, this is slow:

cdef float X[:, :] = ...  # big 2d view
for i in range(n_samples):  # same with prange, same with or without the GIL
	f(X[i])

do this instead, or use pointers, at least for now:

for i in range(n_samples):
	f(X, i)  # and work on X[i, :] in f's code

This is valid for any pattern that generates lots of views so looping over features might not be a good idea either if we expect lots of features.
There might be a "fix" in cython/cython#2227 / cython/cython#3617

The reason is that there's a significant overhead when creating all these 1d views, which comes from Cython internal ref-counting (details at cython/cython#2987). In the hist-GBDT prediction code, this overhead amounts for more than 30% of the runtime so it's not negligible.

Note that:

  • Doing this with prange used to generate some additional Python interactions, but this was fixed in cython/cython@794d21d and backported to Cython 0.29
  • Now that no Python interactions are generated, we need to be extra careful with this because we won't even see it in Cython annotated files

CC @scikit-learn/core-devs

@glemaitre
Copy link
Member

Shall we pin this issue (at least to easily come back to the discussion)?

@NicolasHug NicolasHug pinned this issue May 27, 2020
@rth
Copy link
Member

rth commented May 28, 2020

Sounds like cython/cython#3617 might be a fix for some of it?

@NicolasHug
Copy link
Member Author

yeah, I updated the comment

@da-woods
Copy link

da-woods commented Jun 6, 2020

FYI I don't think cython/cython#3617 will really help with speed here - it just makes

for xi in X:
 f(xi)

equivalent (speed-wise) to

for i in range(n_samples):
 f(X[i])

You'd still be better off using your second version if you really need the best performance.

@jjerphan
Copy link
Member

Slicing memoryviews also comes with some more operations to setup new memoryviews. Even if there's little interactions with the CPython interpreter, this might still be costly, especially if slicing is used in nogil-routines that are called repeatedly.

Generally, I think creating memoryview might be costly in nogil context and one should probably access and operate on memory-views' elements directly (as illustrated by @NicolasHug in the description of this issue) as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants