Tree pickle portability between 64bit and 32bit arch #19602

aryanxk02 · 2021-03-03T08:54:13Z

Decision Tree Classifier

C:\Users\Aryan\Desktop\heart-host>python app.py
C:\Users\Aryan\AppData\Local\Programs\Python\Python38-32\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.tree.tree module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API.
  warnings.warn(message, FutureWarning)
Traceback (most recent call last):
  File "app.py", line 5, in <module>
    heart_dtc = pickle.load(open('dtc.pkl', 'rb'))
  File "sklearn\tree\_tree.pyx", line 607, in sklearn.tree._tree.Tree.__cinit__
`ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'`

The text was updated successfully, but these errors were encountered:

glemaitre · 2021-03-03T08:55:27Z

Are you running the same version scikit-learn than the one used to create the pkl?

aryanxk02 · 2021-03-03T09:02:16Z

sklearn version : 0.22.2
YES

glemaitre · 2021-03-03T09:58:10Z

Could you as well give information about the OS and CPU architecture where the model was trained and where it is deployed?

ogrisel · 2021-03-03T11:19:30Z

From the type error, one could guess that the model was pickled with Python 32 bit on Windows and loaded on Python 64 bit (e.g. on a Linux server). Is this the case?

If this is confirmed, we should probably consider this a bug of scikit-learn as @rth recently hinted in another issue or PR (that I failed to find).

However we would need to make sure that this bug has not already been fixed. @aryanxk02, can you try to check if you can reproduce the problem with scikit-learn 0.24 or a even better a nightly build?

https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds

aryanxk02 · 2021-03-03T11:41:40Z

Could you as well give information about the OS and CPU architecture where the model was trained and where it is deployed?

currently on Windows 64 bit Operating System

aryanxk02 · 2021-03-03T11:44:34Z

From the type error, one could guess that the model was pickled with Python 32 bit on Windows and loaded on Python 64 bit (e.g. on a Linux server). Is this the case?

If this is confirmed, we should probably consider this a bug of scikit-learn as @rth recently hinted in another issue or PR (that I failed to find).

However we would need to make sure that this bug has not already been fixed. @aryanxk02, can you try to check if you can reproduce the problem with scikit-learn 0.24 or a even better a nightly build?

https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds

Okay, I developed the entire code (including ML model to pickle file) and ran it on python 32 bit through Jupyter Notebook on Windows.

Sure, regarding scikit-learn 0.24 I will try and will inform.

ogrisel · 2021-03-03T11:46:20Z

What version / bitness of Python and OS do you use to run the flask app?

aryanxk02 · 2021-03-03T11:48:30Z

What version / bitness of Python and OS do you use to run the flask app?

Python 3.8 - 32 bit
Windows 10 - 64 bit

ogrisel · 2021-03-03T14:22:55Z

If you use the same version and bitness of Python both the training the model and running the flask app then I do not understand how you can get this error.

Can you please run import sklearn; sklearn.show_versions() both in your flask app and in your jupyter notebook to check this?

ogrisel · 2021-03-03T14:23:46Z

Also, can you check that you can run pickle.load(open('dtc.pkl', 'rb')) without error in your notebook right after saving the model to disk?

rth · 2021-03-03T14:58:56Z

Identical issue was earlier reported in #7891 when unpickling on 32bit arm and in pyodide/pyodide#521

BTW, shouldn't size_t be unsigned, it says long long in the error message, unless the "unsigned" part is swallowed somewhere ? At least on 64bit linux, I get,

$ echo | gcc -E -xc -include 'stddef.h' - | grep size_t
typedef long unsigned int size_t;

In the above linked issue, I mentioned,

In particular it sounds like Tree.n_classes returned a long long while it should have been size_t

but I no longer remember the reasoning behind that comment.

Edit: WebAssembly VM is also little endian same as x86_64 so it's not a byte order issue.

aryanxk02 · 2021-03-03T15:04:22Z

If you use the same version and bitness of Python both the training the model and running the flask app then I do not understand how you can get this error.

Can you please run import sklearn; sklearn.show_versions() both in your flask app and in your jupyter notebook to check this?

Hello, regarding this I just checked my python bitness in the Jupyter Notebook where I ran the entire code, it was 64 bit. And the one I used for flask was 32 bit.
Thank you so much!

rth · 2021-03-04T13:25:10Z

Below are minimal steps to reproduce on a 64bit Linux with a 32 bit docker image.

On a 64bit Linux,

conda create -n test-sklearn python=3.7.3
conda activate test-sklearn
pip install scikit-learn==0.24.1
mkdir tmp/
cd tmp/
python example.py

with
example.py

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
import joblib

iris = load_iris()
clf = RandomForestClassifier().fit(iris.data, iris.target)

joblib.dump(clf, 'estimator.pkl')

Start a 32bit Docker image,

docker run -it --rm -v "$(pwd):/shared" i386/debian

apt update
apt install python3 python3-venv
python -m venv venv
source venv/bin/activate
pip install scikit-learn==0.24.1
python -c "import joblib; joblib.load('/shared/estimator.pkl')"

produces,

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/venv/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.7/pickle.py", line 1085, in load
    dispatch[key[0]](self)
  File "/usr/lib/python3.7/pickle.py", line 1433, in load_reduce
    stack[-1] = func(*args)
  File "sklearn/tree/_tree.pyx", line 595, in sklearn.tree._tree.Tree.__cinit__
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

I'll try to investigate in more details later.

aryanxk02 · 2021-03-05T20:12:03Z

@rth any guidance on how to fix this for Windows OS would be appreciated sir.

bayesfactor · 2021-10-13T07:33:09Z

why is this closed? I'm still having this issue with saving a random forest classifier: when training/saving on a 64-bit python and loading/deploying on 32-bit python, I get the error

ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

aryanxk02 · 2021-10-13T07:54:21Z

why is this closed? I'm still having this issue with saving a random forest classifier: when training/saving on a 64-bit python and loading/deploying on 32-bit python, I get the error
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

Nope, both the interpreters should be same. Either 64bit or 32bit. But same!

rth · 2021-10-13T08:47:30Z

Indeed it's still a valid issue. Re-opening.

The general issue is that we should stop using size_t as dtype in objects that get serialized, since it has a different size depending on the architecture.

lesteve · 2021-10-13T09:12:11Z

The general issue is that we should stop using size_t as dtype in objects that get serialized, since it has a different size depending on the architecture.

Having looked recently at #21237 (somewhat related issue but for different endianness rather than 64bit vs 32bit) I think there will very likely be more issues down the line for example here:

scikit-learn/sklearn/tree/_tree.pyx

Lines 651 to 656 in 682bd05

    
           if (node_ndarray.dtype != NODE_DTYPE): 
        
               # possible mismatch of big/little endian due to serialization 
        
               # on a different architecture. Try swapping the byte order.   
        
               node_ndarray = node_ndarray.byteswap().newbyteorder() 
        
               if (node_ndarray.dtype != NODE_DTYPE): 
        
                   raise ValueError('Did not recognise loaded array dytpe')

node_ndarray.dtype is from the pickle and is a structured dtype containing 64bit dtypes
NODE_DTYPE is from the machine where the pickle is loaded and is a structured dtype containing 32bit dtypes

Both dtypes won't match and there will be an error.

rth · 2021-10-13T09:23:25Z

Hmm, yes the endiannes issue is a bit more complex. And it looks like #17644 might have created more issues than it solved. But what do you think we should do for 32/64 bit compatibility, which is probably easier to manage?

Aren't there portable dtypes defined in numpy we could use? Say replace all usize_t with np.int32 (or unsigned long long)? But I'm not sure how efficient that would be for scalars.

ogrisel · 2021-10-13T13:50:58Z

I think the endianess issue is (quite) orthogonal to the int precision issue: even if we had pickles from from the same endianess we would still have a problem with the handling of the precision.

The reason is that the Cython Node datastructure and therefore the NODE_DTYPE are platform specific:

ctypedef np.npy_float32 DTYPE_t          # Type of X
ctypedef np.npy_float64 DOUBLE_t         # Type of y, sample_weight
ctypedef np.npy_intp SIZE_t              # Type for indices and counters

cdef struct Node:
    # Base storage structure for the nodes in a Tree object

    SIZE_t left_child                    # id of the left child of the node
    SIZE_t right_child                   # id of the right child of the node
    SIZE_t feature                       # Feature used for splitting the node
    DOUBLE_t threshold                   # Threshold value at the node
    DOUBLE_t impurity                    # Impurity of the node (i.e., the value of the criterion)
    SIZE_t n_node_samples                # Number of samples at the node
    DOUBLE_t weighted_n_node_samples     # Weighted number of samples at the node

The np.npy_intp type (which is equivalent to ssize_t in C) is platform specific: on 32 bit platforms this is equivalent to np.int32 while it's np.int64 on 64 bit platforms.

Similarly the __cinit__(self, int n_features, np.ndarray[SIZE_t, ndim=1] n_classes, int n_outputs) constructor expects a n_classes numpy array with a platform specific dtype.

I think, the simplest solution would be to put some conversion logic in a new __init__ method (to initialize the n_classes array) as well as the __setstate__ method to convert from int32 or in64 arrays (or recarrays fields in the case of NODE_DTYPE) on the fly. The conversion it self should be a matter of calling .astype on those arrays with the expected dtypes (np.intp for n_classes) and NODE_DTYPE for the nodes in __setstate__.

Note that when both __cinit__ and __init__ are defined, both are called with the same arguments in that order: https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#initialisation-methods-cinit-and-init so it should be ok to delay the initialize and conversion of the n_classes attribute to the __init__ method to be able to do the runtime type conversion.

thomasjpfan · 2021-10-13T14:43:13Z

For us to maintain code that enables portability between 32/64bit systems, I think we need a new CI for testing. The test would consist of:

Build 64bit wheel.
Train models in all_estimators in a 64bit platform and pickle them.
Build 32bit wheel.
Load models in a 32bit platform and predict/transform.

Side note: Numpy has recently decided to not have 32-bit manylinux wheels, so building a 32bit wheel for scikit-learn may become harder.

ogrisel · 2021-10-13T16:16:25Z

That's a possibility but it's quite heavy to setup and maintain.

A simpler solution would be to dumps a bunch of potentially problematic models (e.g. Cython based models such as kd-trees, balltrees, RFs, GBRT and HistGBRT generated by a small helper scripts) on a 64 bit machine once in a while and store the results in a folder of the github repo (the model should be parameterized to be small, e.g. a few kB) and then write a test to check that we can call score and get the expected accuracy in a test that will be run both on 32 bit and 64 bit CIs. This will also be useful to be aware of the (lack of) forward compat of the pickles between consecutive scikit-learn versions. When the class structure changes we will have to regenerate some pickle files but at least we know when and what we break before our users complain about it, even if we do not support pickle compat between scikit-learn versions :)

ogrisel · 2021-10-13T16:17:42Z

Side note: Numpy has recently decided to not have 32-bit manylinux wheels, so building a 32bit wheel for scikit-learn may become harder.

If they stop doing that, then maybe we should stop doing this as well... But I suppose 32 bit platforms will still be supported via debian and conda-forge packages for instance.

rth added the Bug label Mar 4, 2021

aryanxk02 closed this as completed Apr 6, 2021

rth changed the title ~~Error while running model in pkl format through Flask~~ Tree pickle portability between 64bit and 32bit arch Oct 13, 2021

rth reopened this Oct 13, 2021

ogrisel mentioned this issue Oct 13, 2021

Load classifier on different machines #7891

Closed

This was referenced Nov 3, 2021

Test decision tree pickle for different endianness #21539

Merged

Support cross 32bit/64bit pickles for decision tree #21552

Merged

rth closed this as completed in #21552 Nov 25, 2021

rth mentioned this issue Nov 29, 2021

Pickling of sklearn.tree.Tree pyodide/pyodide#521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree pickle portability between 64bit and 32bit arch #19602

Tree pickle portability between 64bit and 32bit arch #19602

aryanxk02 commented Mar 3, 2021 •

edited by ogrisel

glemaitre commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021 •

edited

glemaitre commented Mar 3, 2021

ogrisel commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

ogrisel commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

ogrisel commented Mar 3, 2021

ogrisel commented Mar 3, 2021

rth commented Mar 3, 2021 •

edited

aryanxk02 commented Mar 3, 2021 •

edited

rth commented Mar 4, 2021 •

edited

aryanxk02 commented Mar 5, 2021

bayesfactor commented Oct 13, 2021

aryanxk02 commented Oct 13, 2021

rth commented Oct 13, 2021

lesteve commented Oct 13, 2021 •

edited

rth commented Oct 13, 2021 •

edited

ogrisel commented Oct 13, 2021 •

edited

thomasjpfan commented Oct 13, 2021 •

edited

ogrisel commented Oct 13, 2021

ogrisel commented Oct 13, 2021

Tree pickle portability between 64bit and 32bit arch #19602

Tree pickle portability between 64bit and 32bit arch #19602

Comments

aryanxk02 commented Mar 3, 2021 • edited by ogrisel

glemaitre commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021 • edited

glemaitre commented Mar 3, 2021

ogrisel commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

ogrisel commented Mar 3, 2021

aryanxk02 commented Mar 3, 2021

ogrisel commented Mar 3, 2021

ogrisel commented Mar 3, 2021

rth commented Mar 3, 2021 • edited

aryanxk02 commented Mar 3, 2021 • edited

rth commented Mar 4, 2021 • edited

aryanxk02 commented Mar 5, 2021

bayesfactor commented Oct 13, 2021

aryanxk02 commented Oct 13, 2021

rth commented Oct 13, 2021

lesteve commented Oct 13, 2021 • edited

rth commented Oct 13, 2021 • edited

ogrisel commented Oct 13, 2021 • edited

thomasjpfan commented Oct 13, 2021 • edited

ogrisel commented Oct 13, 2021

ogrisel commented Oct 13, 2021

aryanxk02 commented Mar 3, 2021 •

edited by ogrisel

aryanxk02 commented Mar 3, 2021 •

edited

rth commented Mar 3, 2021 •

edited

aryanxk02 commented Mar 3, 2021 •

edited

rth commented Mar 4, 2021 •

edited

lesteve commented Oct 13, 2021 •

edited

rth commented Oct 13, 2021 •

edited

ogrisel commented Oct 13, 2021 •

edited

thomasjpfan commented Oct 13, 2021 •

edited