Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test decision tree pickle for different endianness #21539

Merged
merged 4 commits into from Nov 4, 2021

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Nov 3, 2021

Reference Issues/PRs

Close #21359

What does this implement/fix? Explain your changes.

This adds a test that checks that pickles containing big endian arrays can be loaded on a little-endian machine.

The way the pickles are created is to change the Pickle dispatch_table or to reimplement NumpyPickle.save for joblib pickles. This creates pickles with numpy arrays in big endian, which does not match exactly a pickle created on a big endian machine. The main advantage IMO is that we have some test for #17644 ...

I thought it was simpler that using qemu inside a docker image (look at #21237 (comment) for more details) as mentioned in #19602 (comment).

Doing something similar for more (or even all estimators) as mentioned in #19602 (comment) is left for a future PR.

@lesteve lesteve changed the title Cross architecture tree pickle Test decision tree pickle for different endianness Nov 3, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

And +1 for a similar test (+fix) for #19602 in a follow-up PR to support pickling of tree structures that have system specific precision levels in their integer fields.

@lesteve
Copy link
Member Author

lesteve commented Nov 4, 2021

And +1 for a similar test (+fix) for #19602 in a follow-up PR to support pickling of tree structures that have system specific precision levels in their integer fields.

Yep I was working on it in parallel since this touches the same code, I opened a draft PR with a similar approach: #21552. This is better to merge this one first.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks a lot @lesteve !

Maybe this should be put somewhere in common tests? Or at least in some folder where it would make sense to run it parametrized for a few estimators (I'm thinking also KDTree/BallTree #21553 ). Otherwise LGTM.

@lesteve
Copy link
Member Author

lesteve commented Nov 4, 2021

Maybe this should be put somewhere in common tests? Or at least in some folder where it would make sense to run it parametrized for a few estimators (I'm thinking also KDTree/BallTree #21553 ). Otherwise LGTM.

I have this in mind eventually but for now my current strategy is:

@rth
Copy link
Member

rth commented Nov 4, 2021

OK, fair enough. Thanks!

@rth rth merged commit 46485a9 into scikit-learn:main Nov 4, 2021
@lesteve lesteve deleted the cross-architecture-tree-pickle branch November 4, 2021 13:49
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove code introduced by #17644
3 participants