Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross endianness and bitness pickle issues with KNeighborsClassifier / KDTree #21553

Open
lesteve opened this issue Nov 4, 2021 · 2 comments

Comments

@lesteve
Copy link
Member

lesteve commented Nov 4, 2021

Reported in #21237 (cross endianness). There is a similar issue for the cross bitness, to reproduce:

Generate a pickle on a 64bit machine:

from sklearn.datasets import make_classification
X, y = make_classification(random_state=0)

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(algorithm='kd_tree')
clf.fit(X, y)
import pickle
pickle.dump(clf, open('/tmp/kneighbors.pkl', 'wb'))

Open it on a 32bit machine:

docker run -it -v /tmp:/io lesteve/i386-scikit-learn python3 -c 'import pickle; pickle.load(open("/io/kneighbors.pkl", "rb"))'

Output:

WARNING: The requested image's platform (linux/386) does not match the detected host platform (linux/amd64) and no specific platform was requested
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "sklearn/neighbors/_binary_tree.pxi", line 1062, in sklearn.neighbors._kd_tree.BinaryTree.__setstate__
    self._update_memviews()
  File "sklearn/neighbors/_binary_tree.pxi", line 1004, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
    self.idx_array = self.idx_array_arr
ValueError: Buffer dtype mismatch, expected 'ITYPE_t' but got 'long long'

Quite likely solving the issue is rather similar to #21552 and #21539.

@rth
Copy link
Member

rth commented Nov 4, 2021

I think it would be good to run your test from #21539 on all estimators to see which one fails. Though I think tree based or neighbors tree based estimators are likely the most frequent issue.

@lesteve
Copy link
Member Author

lesteve commented Nov 4, 2021

I have this in mind eventually, but focussing getting #21552 first, we'll see how this pans out.

It seems this kind of problems happens when __setstate__, __getstate__ or __cinit__ are implemented in cython. A quick pattern seems to confirm what you are saying:

❯ ag -l 'def (__setstate__|__getstate__|__cinit__)' -G'(pyx|pxi)$'    
sklearn/metrics/_dist_metrics.pyx
sklearn/tree/_utils.pyx
sklearn/tree/_tree.pyx
sklearn/tree/_criterion.pyx
sklearn/tree/_splitter.pyx
sklearn/neighbors/_binary_tree.pxi
sklearn/neighbors/_quad_tree.pyx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants