Remove code introduced by #17644 #21359

lesteve · 2021-10-18T14:15:21Z

#17644 introduced code in Tree.__setstate__ to deal with pickles saved in a different endianness that the endianness on the machine the pickle is loaded on.

#21237 showed that this fix was not working since there is an error in Tree.__cinit__ (i.e .before __setstate__). This is probably better to remove this tricky code if it is not useful.

Side-comment: this problem is avoided with joblib 1.1 which loads arrays in native endianness (same behaviour as pickle).

While I am at it adding a test with a pickle generated on a big-endian machine would be nice.

The text was updated successfully, but these errors were encountered:

ogrisel · 2021-10-20T09:08:54Z

I agree with the proposed strategy.

thomasjpfan · 2021-10-20T19:52:03Z

+1 on removing. Do we want to consider this a bug fix for 1.0.1 or 1.0.2?

lesteve · 2021-10-26T16:12:34Z

After some more investigation #17644 is still needed if using pickle, somehow joblib (with version >= 1.1.0) always uses native endianness whereas pickle keeps the pickle endianness for structured arrays ...

import pickle
import numpy as np
import joblib

arr = np.array([(1, 2.0)], dtype=[("myint", ">i8"), ("myfloat", ">f8")])
print(f"original dtype: {arr.dtype}")

numpy_dtype = pickle.loads(pickle.dumps(arr)).dtype
print(f"after numpy dump+load: {numpy_dtype}")

joblib.dump(arr, "/tmp/test.pkl")
joblib_dtype = joblib.load("/tmp/test.pkl").dtype
print(f"after joblib dump+load: {joblib_dtype}")

Output (on a little-endian machine i.e. most common):

original dtype: [('myint', '>i8'), ('myfloat', '>f8')]
after numpy dump+load: [('myint', '>i8'), ('myfloat', '>f8')]
after joblib dump+load: [('myint', '<i8'), ('myfloat', '<f8')]

Side-comment: for simple dtypes e.g. ('float64', 'int64' etc ...) both pickle and joblib use native endianness (i.e. don't respect the endianness of the pickle).

Why this matters in the context of #17644 is that NODE_DTYPE is a structured array with the following dtype (little-endian machine):

dtype=[('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')])

node_array.dtype comes from the pickle and can have the opposite endianness if it has been pickled on a big endian machine and loaded with pickle.

lesteve mentioned this issue Nov 3, 2021

Test decision tree pickle for different endianness #21539

Merged

rth closed this as completed in #21539 Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove code introduced by #17644 #21359

Remove code introduced by #17644 #21359

lesteve commented Oct 18, 2021 •

edited

ogrisel commented Oct 20, 2021

thomasjpfan commented Oct 20, 2021

lesteve commented Oct 26, 2021

Remove code introduced by #17644 #21359

Remove code introduced by #17644 #21359

Comments

lesteve commented Oct 18, 2021 • edited

ogrisel commented Oct 20, 2021

thomasjpfan commented Oct 20, 2021

lesteve commented Oct 26, 2021

lesteve commented Oct 18, 2021 •

edited