Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle portability little 馃 big endian #21237

Closed
sgundura opened this issue Oct 4, 2021 · 9 comments
Closed

Pickle portability little 馃 big endian #21237

sgundura opened this issue Oct 4, 2021 · 9 comments

Comments

@sgundura
Copy link

sgundura commented Oct 4, 2021

Describe the bug

We are trying to load some of the AI models that are dumped in a little endian machine using joblib, on AIX which runs on power architecture which is big endian. Most of them worked, but there are 2 models which we found are giving errors. We tried to load it on a ubuntu linux that runs on power(big endian), and got the same error. We even tried building the latest nighly build of sklearn module, and still got this error. Attaching the programs we used to dump and load.
The two models that didn't work are:

  1. KNearestNeighbor
  2. RandomForest

Steps/Code to Reproduce

KNearestNeighbor_Dump.py(to be run on little endian machine):

import joblib

# Load dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN

iris = load_iris()

X = iris.data
y = iris.target

# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)

knn = KNN(n_neighbors=3)

# train model
knn.fit(X_train, y_train)
result = knn.predict(X_test)
print(result)

dump_file = "./KNearestNeighbor.joblib"
tuple_pickle = (knn, X_test, result)
with open(dump_file, 'wb') as file_dump:
    joblib.dump(tuple_pickle, file_dump)

KNearestNeighbor_Load.py(to be run on Big Endian machine after copying KNearestNeighbor.joblib file generated from above program):

import joblib
from sklearn.neighbors import KNeighborsClassifier as KNN

dump_file = "./KNearestNeighbor.joblib"
with open(dump_file, 'rb') as file_reader:
    trained_model, testing_data, original_prediction = joblib.load(file_reader)

dump_result = trained_model.predict(testing_data)
print(dump_result)
print(original_prediction)

RandomForest_Dump.py(to be run on little endian machine):
import os
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

#Creating datasets
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=13)

#Training the RandomForest
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

#output
result=rf.predict(X_test)
print(result)

#saving using the joblib
job_obj = (rf, X_test, result)
joblib.dump(job_obj, "./random_forest.joblib")

RandomForest_Load.py(to be run on Big Endian machine, after copying random_forest.joblib file generated from above program):
import joblib
from sklearn.model_selection import train_test_split

load_RF,X_test_job,result_orig = joblib.load("./random_forest.joblib")
load_RF.predict(X_test_job)

result_test = load_RF.predict(X_test_job)
print(result_orig)
print(result_test)

Expected Results

no error is thrown

Actual Results

python3 KNearestNeighborReadDump.py
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_vec_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_mat_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
Traceback (most recent call last):
  File "KNearestNeighborReadDump.py", line 6, in <module>
    trained_model, testing_data, original_prediction = joblib.load(file_reader)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 575, in load
    obj = _unpickle(fobj)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 329, in load_build
    Unpickler.load_build(self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1552, in load_build
    setstate(state)
  File "sklearn/neighbors/_binary_tree.pxi", line 1164, in sklearn.neighbors._kd_tree.BinaryTree.__setstate__
  File "sklearn/neighbors/_binary_tree.pxi", line 1105, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
  File "sklearn/neighbors/_binary_tree.pxi", line 204, in sklearn.neighbors._kd_tree.get_memview_DTYPE_2D
ValueError: Little-endian buffer not supported on big-endian compiler
python3 RandomForest_Load.py
/opt/freeware/lib64/python3.7/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
Traceback (most recent call last):
  File "RandomForest_Load.py", line 4, in <module>
    load_RF,X_test_job,reult_orig = joblib.load("../dumps/random_forest.joblib")
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1436, in load_reduce
    stack[-1] = func(*args)
  File "sklearn/tree/_tree.pyx", line 595, in sklearn.tree._tree.Tree.__cinit__
ValueError: Little-endian buffer not supported on big-endian compiler

Versions

System:
    python: 3.7.10 (default, Jun  1 2021, 05:23:20)  [GCC 8.3.0]
executable: /opt/freeware/bin/python3
   machine: AIX-2-00C581D74C00-powerpc-64bit-COFF

Python dependencies:
          pip: 20.1.1
   setuptools: 47.1.0
      sklearn: 0.24.2
        numpy: 1.20.3
        scipy: 1.6.3
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True
@amueller
Copy link
Member

amueller commented Oct 6, 2021

Hi! Generally pickle is not supposed to be platform independent, so this is expected behavior.
If you want to serialize to a cross-platform format, maybe ONNX and PMML (is that still a thing) might help?

@glemaitre
Copy link
Member

However on this specific topic little vs. big endian, @lesteve did we already encounter the issue and actually did something about it in joblib?

@lesteve
Copy link
Member

lesteve commented Oct 7, 2021

I am guessing you have this joblib PR in mind: joblib/joblib#1181. I am not sure whether this would fix the problem reported here, but maybe worth a try (by installing joblib development version).

@sgundura
Copy link
Author

sgundura commented Oct 7, 2021

The following sklearn documentation says, "Aside for a few exceptions, pickled models should be portable across architectures assuming the same versions of dependencies and Python are used. If you encounter an estimator that is not portable please open an issue on GitHub": https://scikit-learn.org/stable/modules/model_persistence.html

So I thought sklearn supports dumps accoss little endian and big endian architectures.

Also, I found following issue fixes a similar issue when running GradientBoostingClassifier model: #17644

And we saw that GradientBoostingClassifier model works fine. Can a similar fix be done for the other 2 models which we found that are not working?

@rth
Copy link
Member

rth commented Oct 7, 2021

Generally pickle is not supposed to be platform independent, so this is expected behavior.

@amueller They are portable (#19561, #17644 (comment)) aside from custom C structs that we serialize where we should probably be more careful.

RandomForest

GradientBoostingClassifier model: #17644

Yes, #17644 should have fixed it, I think, but apparently it didn't. It's the same issue with sklearn.tree._tree.Tree serialization.

@lesteve
Copy link
Member

lesteve commented Oct 8, 2021

@sgundura can you try using joblib==1.1.0 (released yesterday) to load the pickle? That actually may fix it. My understanding is that with joblib/joblib#1181, joblib load arrays with native endianness and avoids dtypes non-matching that #17644 was supposed to address.

Longer story: I was able to get a similar error by:

The next snippet loads a pickle, so only run it if you think you can trust me. This contains the pickle generated inside the s390x docker image. This should reproduce the error on a little-endian machine (so very likely on your machine):

import io
import joblib

joblib.load(io.BytesIO(b"\x80\x04\x951\x02\x00\x00\x00\x00\x00\x00\x8c\x15sklearn.tree._classes\x94\x8c\x16DecisionTreeClassifier\x94\x93\x94)\x81\x94}\x94(\x8c\tcriterion\x94\x8c\x04gini\x94\x8c\x08splitter\x94\x8c\x04best\x94\x8c\tmax_depth\x94K\x01\x8c\x11min_samples_split\x94K\x02\x8c\x10min_samples_leaf\x94K\x01\x8c\x18min_weight_fraction_leaf\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0cmax_features\x94N\x8c\x0emax_leaf_nodes\x94N\x8c\x0crandom_state\x94N\x8c\x15min_impurity_decrease\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x12min_impurity_split\x94N\x8c\x0cclass_weight\x94N\x8c\tccp_alpha\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0en_features_in_\x94K\x14\x8c\x0bn_features_\x94K\x14\x8c\nn_outputs_\x94K\x01\x8c\x08classes_\x94\x8c\x13joblib.numpy_pickle\x94\x8c\x11NumpyArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\x08subclass\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94\x8c\x05shape\x94K\x02\x85\x94\x8c\x05order\x94\x8c\x01C\x94\x8c\x05dtype\x94h\x1eh%\x93\x94\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01>\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x8c\nallow_mmap\x94\x88ub\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x95\x9a\x00\x00\x00\x00\x00\x00\x00\x8c\nn_classes_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x06scalar\x94\x93\x94h)C\x08\x00\x00\x00\x00\x00\x00\x00\x02\x94\x86\x94R\x94\x8c\rmax_features_\x94K\x14\x8c\x05tree_\x94\x8c\x12sklearn.tree._tree\x94\x8c\x04Tree\x94\x93\x94K\x14h\x1a)\x81\x94}\x94(h\x1dh h!K\x01\x85\x94h#h$h%h)h,\x88ub\x00\x00\x00\x00\x00\x00\x00\x02\x95J\x01\x00\x00\x00\x00\x00\x00K\x01\x87\x94R\x94}\x94(h\tK\x01\x8c\nnode_count\x94K\x03\x8c\x05nodes\x94h\x1a)\x81\x94}\x94(h\x1dh h!K\x03\x85\x94h#h$h%h&\x8c\x03V56\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94N(\x8c\nleft_child\x94\x8c\x0bright_child\x94\x8c\x07feature\x94\x8c\tthreshold\x94\x8c\x08impurity\x94\x8c\x0en_node_samples\x94\x8c\x17weighted_n_node_samples\x94t\x94}\x94(hHh&\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03h*NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x00\x86\x94hIhSK\x08\x86\x94hJhSK\x10\x86\x94hKh&\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03h*NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x18\x86\x94hLhZK \x86\x94hMhSK(\x86\x94hNhZK0\x86\x94uK8K\x01K\x10t\x94bh,\x88ub\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x11\xbf\xe0\xb4\xb0 \x00\x00\x00?\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d@Y\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xc0\x00\x00\x00\x00\x00\x00\x00?\xb8\xe8\xf1\x057\xb5\xf0\x00\x00\x00\x00\x00\x00\x00'@C\x80\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xc0\x00\x00\x00\x00\x00\x00\x00?\xd5w\x17/f\xfd\xde\x00\x00\x00\x00\x00\x00\x00=@N\x80\x00\x00\x00\x00\x00\x95,\x00\x00\x00\x00\x00\x00\x00\x8c\x06values\x94h\x1a)\x81\x94}\x94(h\x1dh h!K\x03K\x01K\x02\x87\x94h#h$h%hZh,\x88ub@I\x00\x00\x00\x00\x00\x00@I\x00\x00\x00\x00\x00\x00@B\x80\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@*\x00\x00\x00\x00\x00\x00@H\x00\x00\x00\x00\x00\x00\x95!\x00\x00\x00\x00\x00\x00\x00ub\x8c\x10_sklearn_version\x94\x8c\x060.24.1\x94ub."))

On my machine (little-endian) I get an error with joblib 1.0 and no error with joblib 1.1.

Error
~/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
    573         filename = getattr(fobj, 'name', '')
    574         with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 575             obj = _unpickle(fobj)
    576     else:
    577         with open(filename, 'rb') as f:

~/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
    502     obj = None
    503     try:
--> 504         obj = unpickler.load()
    505         if unpickler.compat_mode:
    506             warnings.warn("The file '%s' has been generated with a "

~/miniconda3/lib/python3.9/pickle.py in load(self)
   1210                     raise EOFError
   1211                 assert isinstance(key, bytes_types)
-> 1212                 dispatch[key[0]](self)
   1213         except _Stop as stopinst:
   1214             return stopinst.value

~/miniconda3/lib/python3.9/pickle.py in load_reduce(self)
   1587         args = stack.pop()
   1588         func = stack[-1]
-> 1589         stack[-1] = func(*args)
   1590     dispatch[REDUCE[0]] = load_reduce
   1591 

~/dev/scikit-learn/sklearn/tree/_tree.pyx in sklearn.tree._tree.Tree.__cinit__()
    588             return self._get_value_ndarray()[:self.node_count]
    589 
--> 590     def __cinit__(self, int n_features, np.ndarray[SIZE_t, ndim=1] n_classes,
    591                   int n_outputs):
    592         """Constructor."""

ValueError: Big-endian buffer not supported on little-endian compiler

Edit: I pushed a docker image lesteve/s390x-scikit-learn in case someone needs a big-endian docker image https://hub.docker.com/r/lesteve/s390x-scikit-learn to reproduce this issue or a similar one in the future. You can use it like this for example:

docker run lesteve/s390x-scikit-learn python3 -c 'import sklearn; print(sklearn.__version__); print(sklearn.__file__)'

@sgundura
Copy link
Author

@lesteve , thanks for the suggestion. I will try to update the joblib to latest version and see if it resolves the issue.

@rth rth changed the title Some AI model dumps taken on little endian fail to load on big endian Pickle portability little 馃 big endian Oct 13, 2021
@sgundura
Copy link
Author

I tried the 2 non-working models after updating joblib to 1.1.0(also updated sklearn to 1.0). Now they both worked fine. Thanks for the help! But I am not sure if this worked because of new version of sklearn or joblib, as I updated both of them.

@lesteve
Copy link
Member

lesteve commented Oct 18, 2021

Thanks for the feed-back, I am confident that the fix comes from the joblib upgrade.

Side-comment: in general pickles are not guaranteed to work when you update scikit-learn, see https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations.

I'll open an issue about #17644 since I am not sure it is needed any more (I don't even understand how this could fix anything if I am being honest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants