Pickle portability little 🡒 big endian #21237

sgundura · 2021-10-04T15:26:46Z

Describe the bug

We are trying to load some of the AI models that are dumped in a little endian machine using joblib, on AIX which runs on power architecture which is big endian. Most of them worked, but there are 2 models which we found are giving errors. We tried to load it on a ubuntu linux that runs on power(big endian), and got the same error. We even tried building the latest nighly build of sklearn module, and still got this error. Attaching the programs we used to dump and load.
The two models that didn't work are:

KNearestNeighbor
RandomForest

Steps/Code to Reproduce

KNearestNeighbor_Dump.py(to be run on little endian machine):

import joblib

# Load dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN

iris = load_iris()

X = iris.data
y = iris.target

# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)

knn = KNN(n_neighbors=3)

# train model
knn.fit(X_train, y_train)
result = knn.predict(X_test)
print(result)

dump_file = "./KNearestNeighbor.joblib"
tuple_pickle = (knn, X_test, result)
with open(dump_file, 'wb') as file_dump:
    joblib.dump(tuple_pickle, file_dump)

KNearestNeighbor_Load.py(to be run on Big Endian machine after copying KNearestNeighbor.joblib file generated from above program):

import joblib
from sklearn.neighbors import KNeighborsClassifier as KNN

dump_file = "./KNearestNeighbor.joblib"
with open(dump_file, 'rb') as file_reader:
    trained_model, testing_data, original_prediction = joblib.load(file_reader)

dump_result = trained_model.predict(testing_data)
print(dump_result)
print(original_prediction)

RandomForest_Dump.py(to be run on little endian machine):
import os
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

#Creating datasets
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=13)

#Training the RandomForest
rf = RandomForestClassifier()
rf.fit(X_train,y_train)

#output
result=rf.predict(X_test)
print(result)

#saving using the joblib
job_obj = (rf, X_test, result)
joblib.dump(job_obj, "./random_forest.joblib")

RandomForest_Load.py(to be run on Big Endian machine, after copying random_forest.joblib file generated from above program):
import joblib
from sklearn.model_selection import train_test_split

load_RF,X_test_job,result_orig = joblib.load("./random_forest.joblib")
load_RF.predict(X_test_job)

result_test = load_RF.predict(X_test_job)
print(result_orig)
print(result_test)

Expected Results

no error is thrown

Actual Results

python3 KNearestNeighborReadDump.py

ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_vec_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
ValueError: Little-endian buffer not supported on big-endian compiler
Exception ignored in: 'sklearn.neighbors._dist_metrics.get_mat_ptr'
ValueError: Little-endian buffer not supported on big-endian compiler
Traceback (most recent call last):
  File "KNearestNeighborReadDump.py", line 6, in <module>
    trained_model, testing_data, original_prediction = joblib.load(file_reader)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 575, in load
    obj = _unpickle(fobj)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 329, in load_build
    Unpickler.load_build(self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1552, in load_build
    setstate(state)
  File "sklearn/neighbors/_binary_tree.pxi", line 1164, in sklearn.neighbors._kd_tree.BinaryTree.__setstate__
  File "sklearn/neighbors/_binary_tree.pxi", line 1105, in sklearn.neighbors._kd_tree.BinaryTree._update_memviews
  File "sklearn/neighbors/_binary_tree.pxi", line 204, in sklearn.neighbors._kd_tree.get_memview_DTYPE_2D
ValueError: Little-endian buffer not supported on big-endian compiler

python3 RandomForest_Load.py

/opt/freeware/lib64/python3.7/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
Traceback (most recent call last):
  File "RandomForest_Load.py", line 4, in <module>
    load_RF,X_test_job,reult_orig = joblib.load("../dumps/random_forest.joblib")
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/opt/freeware/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/freeware/lib64/python3.7/pickle.py", line 1436, in load_reduce
    stack[-1] = func(*args)
  File "sklearn/tree/_tree.pyx", line 595, in sklearn.tree._tree.Tree.__cinit__
ValueError: Little-endian buffer not supported on big-endian compiler

Versions

System:
    python: 3.7.10 (default, Jun  1 2021, 05:23:20)  [GCC 8.3.0]
executable: /opt/freeware/bin/python3
   machine: AIX-2-00C581D74C00-powerpc-64bit-COFF

Python dependencies:
          pip: 20.1.1
   setuptools: 47.1.0
      sklearn: 0.24.2
        numpy: 1.20.3
        scipy: 1.6.3
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

amueller · 2021-10-06T18:23:39Z

Hi! Generally pickle is not supposed to be platform independent, so this is expected behavior.
If you want to serialize to a cross-platform format, maybe ONNX and PMML (is that still a thing) might help?

glemaitre · 2021-10-07T08:15:29Z

However on this specific topic little vs. big endian, @lesteve did we already encounter the issue and actually did something about it in joblib?

lesteve · 2021-10-07T08:42:01Z

I am guessing you have this joblib PR in mind: joblib/joblib#1181. I am not sure whether this would fix the problem reported here, but maybe worth a try (by installing joblib development version).

sgundura · 2021-10-07T09:22:18Z

The following sklearn documentation says, "Aside for a few exceptions, pickled models should be portable across architectures assuming the same versions of dependencies and Python are used. If you encounter an estimator that is not portable please open an issue on GitHub": https://scikit-learn.org/stable/modules/model_persistence.html

So I thought sklearn supports dumps accoss little endian and big endian architectures.

Also, I found following issue fixes a similar issue when running GradientBoostingClassifier model: #17644

And we saw that GradientBoostingClassifier model works fine. Can a similar fix be done for the other 2 models which we found that are not working?

rth · 2021-10-07T12:50:57Z

Generally pickle is not supposed to be platform independent, so this is expected behavior.

@amueller They are portable (#19561, #17644 (comment)) aside from custom C structs that we serialize where we should probably be more careful.

RandomForest

GradientBoostingClassifier model: #17644

Yes, #17644 should have fixed it, I think, but apparently it didn't. It's the same issue with sklearn.tree._tree.Tree serialization.

lesteve · 2021-10-08T16:01:31Z

@sgundura can you try using joblib==1.1.0 (released yesterday) to load the pickle? That actually may fix it. My understanding is that with joblib/joblib#1181, joblib load arrays with native endianness and avoids dtypes non-matching that #17644 was supposed to address.

Longer story: I was able to get a similar error by:

training a DecisionTreeClassifier and pickling it with joblib on a big-endian machine. I used @rth's trick Fixed a cross-platform endian issue #17644 (comment) to emulate s390x (big endian) for this. Side-comment: I installed Anaconda following https://docs.anaconda.com/anaconda/install/linux-s390x/, pip needs to compile everything from source and is extremely slow (~4 hours to pip install scipy for example)
using joblib.load on a little-endian machine (my day-to-day computer)

The next snippet loads a pickle, so only run it if you think you can trust me. This contains the pickle generated inside the s390x docker image. This should reproduce the error on a little-endian machine (so very likely on your machine):

import io
import joblib

joblib.load(io.BytesIO(b"\x80\x04\x951\x02\x00\x00\x00\x00\x00\x00\x8c\x15sklearn.tree._classes\x94\x8c\x16DecisionTreeClassifier\x94\x93\x94)\x81\x94}\x94(\x8c\tcriterion\x94\x8c\x04gini\x94\x8c\x08splitter\x94\x8c\x04best\x94\x8c\tmax_depth\x94K\x01\x8c\x11min_samples_split\x94K\x02\x8c\x10min_samples_leaf\x94K\x01\x8c\x18min_weight_fraction_leaf\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0cmax_features\x94N\x8c\x0emax_leaf_nodes\x94N\x8c\x0crandom_state\x94N\x8c\x15min_impurity_decrease\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x12min_impurity_split\x94N\x8c\x0cclass_weight\x94N\x8c\tccp_alpha\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0en_features_in_\x94K\x14\x8c\x0bn_features_\x94K\x14\x8c\nn_outputs_\x94K\x01\x8c\x08classes_\x94\x8c\x13joblib.numpy_pickle\x94\x8c\x11NumpyArrayWrapper\x94\x93\x94)\x81\x94}\x94(\x8c\x08subclass\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94\x8c\x05shape\x94K\x02\x85\x94\x8c\x05order\x94\x8c\x01C\x94\x8c\x05dtype\x94h\x1eh%\x93\x94\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01>\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x8c\nallow_mmap\x94\x88ub\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x95\x9a\x00\x00\x00\x00\x00\x00\x00\x8c\nn_classes_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x06scalar\x94\x93\x94h)C\x08\x00\x00\x00\x00\x00\x00\x00\x02\x94\x86\x94R\x94\x8c\rmax_features_\x94K\x14\x8c\x05tree_\x94\x8c\x12sklearn.tree._tree\x94\x8c\x04Tree\x94\x93\x94K\x14h\x1a)\x81\x94}\x94(h\x1dh h!K\x01\x85\x94h#h$h%h)h,\x88ub\x00\x00\x00\x00\x00\x00\x00\x02\x95J\x01\x00\x00\x00\x00\x00\x00K\x01\x87\x94R\x94}\x94(h\tK\x01\x8c\nnode_count\x94K\x03\x8c\x05nodes\x94h\x1a)\x81\x94}\x94(h\x1dh h!K\x03\x85\x94h#h$h%h&\x8c\x03V56\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94N(\x8c\nleft_child\x94\x8c\x0bright_child\x94\x8c\x07feature\x94\x8c\tthreshold\x94\x8c\x08impurity\x94\x8c\x0en_node_samples\x94\x8c\x17weighted_n_node_samples\x94t\x94}\x94(hHh&\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03h*NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x00\x86\x94hIhSK\x08\x86\x94hJhSK\x10\x86\x94hKh&\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03h*NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x18\x86\x94hLhZK \x86\x94hMhSK(\x86\x94hNhZK0\x86\x94uK8K\x01K\x10t\x94bh,\x88ub\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x11\xbf\xe0\xb4\xb0 \x00\x00\x00?\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d@Y\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xc0\x00\x00\x00\x00\x00\x00\x00?\xb8\xe8\xf1\x057\xb5\xf0\x00\x00\x00\x00\x00\x00\x00'@C\x80\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xc0\x00\x00\x00\x00\x00\x00\x00?\xd5w\x17/f\xfd\xde\x00\x00\x00\x00\x00\x00\x00=@N\x80\x00\x00\x00\x00\x00\x95,\x00\x00\x00\x00\x00\x00\x00\x8c\x06values\x94h\x1a)\x81\x94}\x94(h\x1dh h!K\x03K\x01K\x02\x87\x94h#h$h%hZh,\x88ub@I\x00\x00\x00\x00\x00\x00@I\x00\x00\x00\x00\x00\x00@B\x80\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@*\x00\x00\x00\x00\x00\x00@H\x00\x00\x00\x00\x00\x00\x95!\x00\x00\x00\x00\x00\x00\x00ub\x8c\x10_sklearn_version\x94\x8c\x060.24.1\x94ub."))

On my machine (little-endian) I get an error with joblib 1.0 and no error with joblib 1.1.

Error

~/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
    573         filename = getattr(fobj, 'name', '')
    574         with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 575             obj = _unpickle(fobj)
    576     else:
    577         with open(filename, 'rb') as f:

~/miniconda3/lib/python3.9/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
    502     obj = None
    503     try:
--> 504         obj = unpickler.load()
    505         if unpickler.compat_mode:
    506             warnings.warn("The file '%s' has been generated with a "

~/miniconda3/lib/python3.9/pickle.py in load(self)
   1210                     raise EOFError
   1211                 assert isinstance(key, bytes_types)
-> 1212                 dispatch[key[0]](self)
   1213         except _Stop as stopinst:
   1214             return stopinst.value

~/miniconda3/lib/python3.9/pickle.py in load_reduce(self)
   1587         args = stack.pop()
   1588         func = stack[-1]
-> 1589         stack[-1] = func(*args)
   1590     dispatch[REDUCE[0]] = load_reduce
   1591 

~/dev/scikit-learn/sklearn/tree/_tree.pyx in sklearn.tree._tree.Tree.__cinit__()
    588             return self._get_value_ndarray()[:self.node_count]
    589 
--> 590     def __cinit__(self, int n_features, np.ndarray[SIZE_t, ndim=1] n_classes,
    591                   int n_outputs):
    592         """Constructor."""

ValueError: Big-endian buffer not supported on little-endian compiler

Edit: I pushed a docker image lesteve/s390x-scikit-learn in case someone needs a big-endian docker image https://hub.docker.com/r/lesteve/s390x-scikit-learn to reproduce this issue or a similar one in the future. You can use it like this for example:

docker run lesteve/s390x-scikit-learn python3 -c 'import sklearn; print(sklearn.__version__); print(sklearn.__file__)'

sgundura · 2021-10-12T12:56:47Z

@lesteve , thanks for the suggestion. I will try to update the joblib to latest version and see if it resolves the issue.

sgundura · 2021-10-18T11:15:45Z

I tried the 2 non-working models after updating joblib to 1.1.0(also updated sklearn to 1.0). Now they both worked fine. Thanks for the help! But I am not sure if this worked because of new version of sklearn or joblib, as I updated both of them.

lesteve · 2021-10-18T14:09:16Z

Thanks for the feed-back, I am confident that the fix comes from the joblib upgrade.

Side-comment: in general pickles are not guaranteed to work when you update scikit-learn, see https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations.

I'll open an issue about #17644 since I am not sure it is needed any more (I don't even understand how this could fix anything if I am being honest).

sgundura added the Bug: triage label Oct 4, 2021

lesteve mentioned this issue Oct 13, 2021

Tree pickle portability between 64bit and 32bit arch #19602

Closed

rth changed the title ~~Some AI model dumps taken on little endian fail to load on big endian~~ Pickle portability little 🡒 big endian Oct 13, 2021

lesteve closed this as completed Oct 18, 2021

lesteve mentioned this issue Oct 18, 2021

Remove code introduced by #17644 #21359

Closed

lesteve removed the Bug: triage label Oct 19, 2021

This was referenced Nov 3, 2021

Test decision tree pickle for different endianness #21539

Merged

Cross endianness and bitness pickle issues with KNeighborsClassifier / KDTree #21553

Open

lesteve mentioned this issue Nov 30, 2021

[MRG] Add partial_fit function to DecisionTreeClassifier #18889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickle portability little 🡒 big endian #21237

Pickle portability little 🡒 big endian #21237

sgundura commented Oct 4, 2021 •

edited by glemaitre

amueller commented Oct 6, 2021

glemaitre commented Oct 7, 2021

lesteve commented Oct 7, 2021

sgundura commented Oct 7, 2021

rth commented Oct 7, 2021 •

edited

lesteve commented Oct 8, 2021 •

edited

sgundura commented Oct 12, 2021

sgundura commented Oct 18, 2021

lesteve commented Oct 18, 2021

Pickle portability little 🡒 big endian #21237

Pickle portability little 🡒 big endian #21237

Comments

sgundura commented Oct 4, 2021 • edited by glemaitre

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

amueller commented Oct 6, 2021

glemaitre commented Oct 7, 2021

lesteve commented Oct 7, 2021

sgundura commented Oct 7, 2021

rth commented Oct 7, 2021 • edited

lesteve commented Oct 8, 2021 • edited

sgundura commented Oct 12, 2021

sgundura commented Oct 18, 2021

lesteve commented Oct 18, 2021

sgundura commented Oct 4, 2021 •

edited by glemaitre

rth commented Oct 7, 2021 •

edited

lesteve commented Oct 8, 2021 •

edited