Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 3.7 Pickles to be Loaded in 3.8 #406

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

njwhite
Copy link

@njwhite njwhite commented Mar 12, 2021

Use _create_code Logic when loading pickled objects rather than just the builtin CodeType. If the pickle file was created using, e.g., Python 3.7 then the serialized object will contain 15 arguments (missing co_posonlyargcount) but the version in the current (Python 3.8) interpreter expects 16. This PR just fills in a zero reusing the existing logic in _dill.py

Related issues: #357 #318 #394 cloudpipe/cloudpickle#396 python/cpython#12701 facebookincubator#39

@njwhite
Copy link
Author

njwhite commented Mar 22, 2021

Hi @mmckerns - can I get a review for this? I’m just moving around a few lines of existing code so I don’t think the code coverage CI failure is meaningful. Thanks -

@mmckerns
Copy link
Member

Impact of the change needs to be assessed on the various use cases.

@mmckerns
Copy link
Member

Is your description still correct? It seems that all this PR does is to move _create_code and then add it to the typemap. Can you clarify what is this PR meant to do, exactly?

@njwhite
Copy link
Author

njwhite commented May 29, 2021

The key bit is line 585 - when the deserialisation code sees a CodeType it now uses the backcompat logic in _create_code to interpret it. Previously dill would just try to read the CodeType using the default logic, which fails on 3.8 when you try to read a 3.7 pickle as the signature has changed.

@mmckerns
Copy link
Member

Here's an example of the current behavior:

Python 3.7.10 (default, Mar 18 2021, 06:11:04) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> def foo(w, x=1, *y, **z):
...   return w+x+sum(y)+sum(z.values())
... 
>>> f = open('foo.pkl', 'wb')
>>> dill.dump(foo, f)
>>> dill.dump(foo.__code__, f)
>>> f.close()

and then in 3.8

Python 3.8.10 (default, May  7 2021, 23:18:56) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('foo.pkl', 'rb')
>>> foo = dill.load(f)
>>> c = dill.load(f)
>>> foo(1)
2
>>> f.close()

Can you provide a case where dill currently fails, but your PR enables it to succeed?

@njwhite
Copy link
Author

njwhite commented Jun 1, 2021

@mmckerns I've pushed a test case - it fails if you comment out the handler.

def lambda_a():
pkl = os.path.join(
os.path.dirname(__file__),
"lambda.pkl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much better to write the file contents elsewhere, instead of relying on a stored pickle file. Was the file written in python 3.7? dill tests are currently run with 2.7, 3.6, 3.7, 3.8, 3.9, 3.10, pypy27, pypy36, and pypy37. Is the test only supposed to run with 3.8?

You don't need to add a test case into the code at the moment, if it's difficult. Rather, just present the details in the main conversation of the Github issue, so I and others can reproduce what you are seeing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file was written with python 3.7 / dill 0.3.0. I think that’s why you can’t put the file contents there (and need the binary) - the bug seems to be that dill isn’t backwards compatible.

The pickled object is just lambda x: x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example using lambda in dill master with 3.7:

Python 3.7.10 (default, Mar 18 2021, 06:11:04) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.dump(lambda x:x, open('test.pkl', 'wb'))

and loading with 3.8...

Python 3.8.10 (default, May  7 2021, 23:18:56) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = dill.load(open('test.pkl', 'rb'))
>>> f(4)
4
>>> 

What is your PR doing that is not possible currently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to dump with dill 0.3.0 and python 3.7, not master/3.7 to reproduce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... you are saying that the issue is that old pickles from python 3.7 created with dill 0.3.0 don't unpickle in python 3.8 with dill master. I'm assuming this is also the case for other old versions of dill (before _create_function was recently modified).

Am I correct in thinking that you could, as a workaround, load the pickle in 3.7 with dill master, and then dump it again... then the resulting file would be able to be opened with 3.8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 3.8 workaround is just to set:

dill._dill._reverse_typemap[‘CodeType’] = dill._dill._create_code

much easier than re-serialising all the pickle files I have lying around :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Riiight, of course... hence the PR. I never really considered it, however, I'm wonder if adding other of the _create_ functions to the reverse_typemap is worth investigating. I'm not certain of what functionality it might impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants