dtype.byteorder is not consistent during cross platform joblib.load() #1123

pradghos · 2020-11-05T15:31:14Z

dtype.byteorder is not consistent during cross platform joblib.load() ( big endian ) of file written in different platform (little endian)

On little endian machine -

Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> jb = joblib.load('joblib_0.11.0_pickle_py36_np111.pkl')
>>> jb[0].dtype
dtype('int64')
>>> jb[0].dtype.byteorder
'='
>>> jb1 = joblib.load('joblib_0.9.2_compressed_pickle_py34_np19.gz')
>>> jb1[0].dtype
dtype('int64')
>>> jb1[0].dtype.byteorder
'='
>>>

On Big Endian Machine (linux-s390x)

(sklearn-test2) [root@plexors1 data]# python
Python 3.7.4 (default, Oct  7 2020, 05:24:07)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> jb = joblib.load('joblib_0.11.0_pickle_py36_np111.pkl')
>>> jb[0].dtype
dtype('<i8')
>>> jb[0].dtype.byteorder
'<'                ----------> This is probably because dtype is written as little endian on joblib_0.11.0_pickle_py36_np111.pkl

I think this is backward compatibility support- calling load_compatibility.
At Big endian system, test expecting dtype '<i8' byteorder but received 'int64'.

>>> jb1 = joblib.load('joblib_0.9.2_compressed_pickle_py34_np19.gz')
>>> jb1[0].dtype
dtype('int64')
>>> jb1[0].dtype.byteorder
'='                
>>>

As result test_joblib_pickle_across_python_versions test is failing on s390x system.

Another question , after loading data from file (written in little endian order) into big endian system, in general dtype.byteorder stays with '<' (little endian ) only. Pls do suggest , after joblib.load, dtype.byteorder should always be '=' (if read correctly) - will this be more consistent approach or not.
Thank you !

The text was updated successfully, but these errors were encountered:

ginggs · 2020-12-07T12:44:59Z

The same failure happens in Ubuntu 21.04 (in development), see:

https://autopkgtest.ubuntu.com/packages/j/joblib/hirsute/s390x

ogrisel · 2020-12-15T16:52:54Z

@pradghos @ginggs I don't have access to a big endian machine unfortunately. How would the pickle standard library behave in a similar case?

Does this issue in joblib causes data corruption (by badly interpreting the dtype) or just

For instance if you try with a small numpy arran such as np.arange(3, dtype=np.uint64) do you get array([0, 1, 2]) with only an unexpected byte order? or do you also get wrong values in the array?

- Addressing joblib#1123

pradghos · 2021-05-05T10:01:40Z

@ogrisel Sorry for delyed response !

As suggested, I have tried the small numpy array i.e np.arange(3, dtype=np.uint64) for both pickle and joblib case -

case 1 : pickle

on x86 system -

Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import sys
>>> sys.byteorder
'little'
>>> import numpy as np
>>> A = np.arange(3, dtype=np.uint64)
>>> pickle.dump( A, open( "save_Array_64.p", "wb" ) )
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.dtype.byteorder
'='
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       0, 0], dtype=uint8)
>>>

Now file: save_Array_64.p moved to Linux-s390x(Big endian) and try to load the file -

on s390x system -

Python 3.7.10 (default, Mar  1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import sys
>>> sys.byteorder
'big'
>>> import numpy as np
>>> A = pickle.load( open( "save_Array_64.p", "rb" ) )
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 2], dtype=uint8)
>>> A.dtype.byteorder
'='
>>>

Now trying out the same scenario for joblib

case 2: Joblib

on x86 system -

Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.byteorder
'little'
>>> import numpy as np
>>> A = np.arange(3, dtype=np.uint64)
>>> import joblib
>>> joblib.dump(A, 'save_Array_joblib_64.pkl')
['save_Array_joblib_64.pkl']
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.dtype.byteorder
'='
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       0, 0], dtype=uint8)
>>>

copied the save_Array_joblib_64.pkl from x86 system to s390x(big-endian) system -

on s390x system:

Python 3.7.10 (default, Mar  1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> import numpy as np
>>> import sys
>>> sys.byteorder
'big'
>>> A = joblib.load('save_Array_joblib_64.pkl')
>>> A.view(np.uint64)
array([                 0,  72057594037927936, 144115188075855872],
      dtype=uint64)
>>> A.dtype.byteorder
'<'
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       0, 0], dtype=uint8)
>>>

pradghos · 2021-05-05T10:05:00Z

Results after the fix - #1181

Python 3.7.10 (default, Mar  1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> import numpy as np
>>> import sys
>>> sys.byteorder
'big'
>>> A = joblib.load('save_Array_joblib_64.pkl')
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 2], dtype=uint8)
>>> A.dtype.byteorder
'='
>>>

It also fixes the issue reported here -

test_numpy_pickle.py::test_joblib_pickle_across_python_versions PASSED

ginggs · 2021-05-05T11:02:16Z

With the fix from #1181, the tests now pass in Ubuntu 21.10 on little-endian (amd64):
https://autopkgtest.ubuntu.com/results/autopkgtest-impish-ginggs-testing/impish/amd64/j/joblib/20210505_105137_9fcee@/log.gz
and big-endian (s390x):
https://autopkgtest.ubuntu.com/results/autopkgtest-impish-ginggs-testing/impish/s390x/j/joblib/20210505_105341_26fe5@/log.gz

pradghos · 2021-06-17T15:11:42Z

@ogrisel @lesteve When will be joblib 's next release ? Any pointers would really help. Thanks!

pradghos · 2021-07-12T04:55:29Z

@ogrisel @lesteve When will be joblib 's next release ? Any pointers would really help. Thanks!

@ogrisel @lesteve Any advice on this ? Thank you !

ravigumm · 2021-07-15T10:48:14Z

Any outlook on when next release tagging can be expected ? Thanks. @ogrisel

potula-chandra · 2021-07-15T11:15:16Z

Do we have any release plan known? Thanks much in advance @ogrisel @lesteve

lesteve · 2021-07-22T12:47:25Z

No we don't know, this is one of the many things that we would do right away if we had infinite time and energy 😉, unfortunately we don't ...

Explaining why a release would be important for you, maybe Ubuntu 21.10 has some strict schedule or maybe something else, could potentially help to decide whether we should prioritise a joblib release.

ginggs · 2021-07-22T15:38:30Z

@lesteve there's no hurry from the Debian/Ubuntu side, we are carrying @pradghos 's fix as a patch and it will be included in the upcoming Debian 11 and Ubuntu 21.10 releases.

lesteve · 2021-09-13T14:32:19Z

This was fix in #1181

lesteve · 2021-10-07T14:51:08Z

Do we have any release plan known? Thanks much in advance @ogrisel @lesteve

FYI the 1.1.0 release has just been uploaded on PyPI.

ogrisel added the bug label Dec 15, 2020

pradghos added a commit to pradghos/joblib that referenced this issue May 5, 2021

Fixing byte-order consistency/missmatch for cross-endian platform

452d4cd

- Addressing joblib#1123

pradghos mentioned this issue May 5, 2021

Fixing byte-order consistency/mismatch for cross-endian platform #1181

Merged

lesteve closed this as completed Sep 13, 2021

fcharras mentioned this issue Apr 4, 2024

Disable endianness alteration on unserialization of numpy arrays in joblib.Parallel #1561

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dtype.byteorder is not consistent during cross platform joblib.load() #1123

dtype.byteorder is not consistent during cross platform joblib.load() #1123

pradghos commented Nov 5, 2020

ginggs commented Dec 7, 2020

ogrisel commented Dec 15, 2020

pradghos commented May 5, 2021

pradghos commented May 5, 2021

ginggs commented May 5, 2021

pradghos commented Jun 17, 2021

pradghos commented Jul 12, 2021

ravigumm commented Jul 15, 2021

potula-chandra commented Jul 15, 2021

lesteve commented Jul 22, 2021 •

edited

ginggs commented Jul 22, 2021

lesteve commented Sep 13, 2021

lesteve commented Oct 7, 2021

dtype.byteorder is not consistent during cross platform joblib.load() #1123

dtype.byteorder is not consistent during cross platform joblib.load() #1123

Comments

pradghos commented Nov 5, 2020

ginggs commented Dec 7, 2020

ogrisel commented Dec 15, 2020

pradghos commented May 5, 2021

pradghos commented May 5, 2021

ginggs commented May 5, 2021

pradghos commented Jun 17, 2021

pradghos commented Jul 12, 2021

ravigumm commented Jul 15, 2021

potula-chandra commented Jul 15, 2021

lesteve commented Jul 22, 2021 • edited

ginggs commented Jul 22, 2021

lesteve commented Sep 13, 2021

lesteve commented Oct 7, 2021

lesteve commented Jul 22, 2021 •

edited