Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random class_tracker_id for dynamic class #510

Open
gmcatsf opened this issue Jul 19, 2023 · 1 comment
Open

Random class_tracker_id for dynamic class #510

gmcatsf opened this issue Jul 19, 2023 · 1 comment

Comments

@gmcatsf
Copy link

gmcatsf commented Jul 19, 2023

cloudpickle generates random uuids to track dynamic classes, and those random uuids are added to outputs. For example, if there is a class serialized by value, the following lines are found with pickletools.dis

  571: s                    SETITEM
  572: \x8c                 SHORT_BINUNICODE 'fa5abda803d644e0bdcfdffec5c8f8d6'
  606: \x94                 MEMOIZE    (as 56)

This string comes from class_tracker_id in cloudpickle and makes binary outputs different even though there is no code change.

Can random ids be replaced with deterministic ids, say a sequential number, for class_tracker_id?

This could be part of existing #453

@ogrisel
Copy link
Contributor

ogrisel commented Oct 13, 2023

@gmcatsf feel free to open a PR for that. I am not sure how the proposal of using a sequential id would pan-out in practice. We need to try and see if the existing tests pass unchanged. We also need new tests to specify what we mean by deterministic pickle files.

We might need a thread-safe counter increment, probably with a lock. We cannot rely on the GIL because we would like this code to work with the nogil fork of CPython (and also PyPy).

An alternative to sequential ids would be to hash the contents of the class def, but that might be too complex / expensive to do because if it might imply scanning the reference graph of the class object twice instead of once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants