New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large numpy arrays stored in big-endian format cannot be serialized, leading to errors with Parallel #1545
Comments
So I am going to guess that the memmapping of a non-native endianness is the issue here but this needs more investigation |
Did a bit of digging, the The bug was most likely added there 0569c89 |
Not related to 0569c89 Can be fixed by passing Anyway, |
Edit: actually it doesn't even work because the memmap is read-only. |
So the bug was in fact likely introduced there 0fa2cb9 The changes to endianess in this PR were aimed at fixing behavior of The fix would be to just bypass the endianess standardization step for automated dump/load steps in I suspect this will also affect the endianess of small arrays received by joblib workers, when the endianess of the arrays in the main worker weren't the same than system endianess... which is technically a breaking change, even if the usecase is very marginal.. |
Problem
The following snippet will fail with
joblib>=1.3.0
(and not withjoblib==1.2.0
) with the stacktrace reported at the end of this message.It only fails when
x
is large enough and stored in big-endian format. For instance,x = np.random.randint(0, 100, (20, 3)).view(">i4")
(small array, big-endian format)x = np.random.randint(0, 100, (200000, 3))
(large array, little-endian format)will both run without error.
Temporary solution
As @lesteve suggested, using
Parallel(n_jobs=2, max_nbytes=None)(...)
gets the snippet to run without error. Maybe we should change default parameters / error messages to guide users towards this solution?Stacktrace
The text was updated successfully, but these errors were encountered: