New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dtype.byteorder is not consistent during cross platform joblib.load() #1123
Comments
The same failure happens in Ubuntu 21.04 (in development), see: https://autopkgtest.ubuntu.com/packages/j/joblib/hirsute/s390x |
@pradghos @ginggs I don't have access to a big endian machine unfortunately. How would the pickle standard library behave in a similar case? Does this issue in joblib causes data corruption (by badly interpreting the dtype) or just For instance if you try with a small numpy arran such as |
@ogrisel Sorry for delyed response ! As suggested, I have tried the small numpy array i.e case 1 : on x86 system - Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import sys
>>> sys.byteorder
'little'
>>> import numpy as np
>>> A = np.arange(3, dtype=np.uint64)
>>> pickle.dump( A, open( "save_Array_64.p", "wb" ) )
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.dtype.byteorder
'='
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
0, 0], dtype=uint8)
>>> Now file: on s390x system - Python 3.7.10 (default, Mar 1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> import sys
>>> sys.byteorder
'big'
>>> import numpy as np
>>> A = pickle.load( open( "save_Array_64.p", "rb" ) )
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 2], dtype=uint8)
>>> A.dtype.byteorder
'='
>>> Now trying out the same scenario for case 2: Joblib on x86 system - Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.byteorder
'little'
>>> import numpy as np
>>> A = np.arange(3, dtype=np.uint64)
>>> import joblib
>>> joblib.dump(A, 'save_Array_joblib_64.pkl')
['save_Array_joblib_64.pkl']
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.dtype.byteorder
'='
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
0, 0], dtype=uint8)
>>> copied the on s390x system: Python 3.7.10 (default, Mar 1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> import numpy as np
>>> import sys
>>> sys.byteorder
'big'
>>> A = joblib.load('save_Array_joblib_64.pkl')
>>> A.view(np.uint64)
array([ 0, 72057594037927936, 144115188075855872],
dtype=uint64)
>>> A.dtype.byteorder
'<'
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
0, 0], dtype=uint8)
>>> |
Results after the fix - #1181 Python 3.7.10 (default, Mar 1 2021, 12:53:44)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import joblib
>>> import numpy as np
>>> import sys
>>> sys.byteorder
'big'
>>> A = joblib.load('save_Array_joblib_64.pkl')
>>> A.view(np.uint64)
array([0, 1, 2], dtype=uint64)
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 2], dtype=uint8)
>>> A.dtype.byteorder
'='
>>> It also fixes the issue reported here -
|
With the fix from #1181, the tests now pass in Ubuntu 21.10 on little-endian (amd64): |
Any outlook on when next release tagging can be expected ? Thanks. @ogrisel |
No we don't know, this is one of the many things that we would do right away if we had infinite time and energy 😉, unfortunately we don't ... Explaining why a release would be important for you, maybe Ubuntu 21.10 has some strict schedule or maybe something else, could potentially help to decide whether we should prioritise a joblib release. |
This was fix in #1181 |
dtype.byteorder is not consistent during cross platform
joblib.load() ( big endian )
of file written in different platform (little endian)On little endian machine -
On Big Endian Machine (linux-s390x)
I think this is backward compatibility support- calling
load_compatibility
.At Big endian system, test expecting dtype '<i8' byteorder but received 'int64'.
As result
test_joblib_pickle_across_python_versions
test is failing on s390x system.Another question , after loading data from file (written in little endian order) into big endian system, in general
dtype.byteorder
stays with '<' (little endian ) only. Pls do suggest , afterjoblib.load
,dtype.byteorder
should always be '=' (if read correctly) - will this be more consistent approach or not.Thank you !
The text was updated successfully, but these errors were encountered: