Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use callable that was pickled within pytest #623

Open
dionhaefner opened this issue Oct 9, 2023 · 14 comments
Open

Cannot use callable that was pickled within pytest #623

dionhaefner opened this issue Oct 9, 2023 · 14 comments

Comments

@dionhaefner
Copy link

dionhaefner commented Oct 9, 2023

I am running tests that serialize callables with dill and try to load them in a subprocess to make sure everything worked correctly. I am getting a cryptic error when trying to load the callable from the subprocess, presumably because dill is failing to load the test module.

Example:

# save as dill_test.py
import sys
import tempfile
from textwrap import dedent

def foo():
    pass

def test_dill():
    import subprocess
    import dill

    with tempfile.TemporaryDirectory() as tmpdir:
        picklefile = f"{tmpdir}/foo.pickle"

        with open(picklefile, "wb") as f:
            f.write(dill.dumps(foo))

        test_script = dedent(f"""
        import dill
        with open("{picklefile}", "rb") as f:
            func = dill.load(f)
        func()
        """)

        subprocess.run([sys.executable, "-c", test_script], check=True)

if __name__ == "__main__":
    test_dill()
    print("ok")

Calling through pytest gives this error:

$ pytest dill_test.py
E               subprocess.CalledProcessError: Command '['/Users/dion/.virtualenvs/py312/bin/python', '-c', '\nimport dill\nwith open("/var/folders/fk/g5ssrkz179z1mjmvqn1j3q1m0000gn/T/tmphuyt802o/foo.pickle", "rb") as f:\n    func = dill.load(f)\nfunc()\n']' returned non-zero exit status 1.

/opt/homebrew/Cellar/python@3.12/3.12.0/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py:571: CalledProcessError
-------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 287, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 442, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 432, in find_class
    return StockUnpickler.find_class(self, module, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'dill_test'
========================================================================= short test summary info =========================================================================
FAILED tests/dill_test.py::test_dill - subprocess.CalledProcessError: Command '['/Users/dion/.virtualenvs/py312/bin/python', '-c', '\nimport dill\nwith open("/var/folders/fk/g5ssrkz179z1mjmvqn1j3q1m0000gn/...

Calling it directly works:

$ python dill_test.py
ok
@dionhaefner
Copy link
Author

Funnily enough, it works when I do this before pickling:

foo.__globals__.pop(foo.__name__)

@mmckerns
Copy link
Member

mmckerns commented Oct 9, 2023

I want to make sure I'm understanding this correctly, but running your script normally works, however if you run under the control of pytest (and subprocess), it throws the error above. Is that correct? If so, I'd be interested to run with dill.detect.trace(True).

@dionhaefner
Copy link
Author

dionhaefner commented Oct 10, 2023

That's what I thought, but now I realized this is actually a pathing issue.

$ python tests/dill_test.py
ok

$ cd tests
$ pytest dill_test.py
ok

$ pytest tests/dill_test.py
NOT OK

So in the latter case, dill.load tries to import dill_test.py but fails because it's not on sys.path. It is fixed by changing the load script to this:

test_script = dedent(f"""
        import dill
        import sys
        sys.path.append("{os.path.dirname(__file__)}")
        with open("{picklefile}", "rb") as f:
            func = dill.load(f)
        func()
""")

Is there a way to pickle a function so it can be executed even if the original module isn't available when unpickling?

@mmckerns
Copy link
Member

Generally, dill assumes that module dependencies are installed... and while it does provide different approaches for tracing dependencies in the global scope... what you might be able to do in any case is to dump the module along with the function. Then you'd load the module and then the function. Something like this is only needed for "uninstalled" modules. This is ok for saving state, but not really that good for parallel computing.

@dionhaefner
Copy link
Author

Generally, dill assumes that module dependencies are installed.

But why is this module a dependency in the first place? The function doesn't access any globals.

@mmckerns
Copy link
Member

The global dict is required to create a function object.

Python 3.8.18 (default, Aug 25 2023, 04:23:37) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import types
>>> print(types.FunctionType.__doc__)
Create a function object.

  code
    a code object
  globals
    the globals dictionary
  name
    a string that overrides the name from the code object
  argdefs
    a tuple that specifies the default argument values
  closure
    a tuple that supplies the bindings for free variables
>>> 

However, dill has different settings that modify how the global dict is handled. So, you can try dill.settings['recurse'] = True, which will only pickle items in the global dict that are pointed to by the function, and otherwise stores a dummy global dict.

@dionhaefner
Copy link
Author

Thanks, I think I understand the problem now. recurse=True doesn't work but I guess that's due to some modifications done to the callable by pytest.

@mmckerns
Copy link
Member

mmckerns commented Oct 11, 2023

you can often see what's going on with dill.detect.trace(True)

@dionhaefner
Copy link
Author

Okay here goes nothing.

This is the case that works:

$ python tests/dill_test.py
┬ F1: <function foo at 0x102580040>
├┬ F2: <function _create_function at 0x102fb32e0>
│└ # F2 [34 B]
├┬ Co: <code object foo at 0x102755b00, file "/private/tmp/tests/dill_test.py", line 6>
│├┬ F2: <function _create_code at 0x102fb3370>
││└ # F2 [19 B]
│└ # Co [102 B]
├┬ D2: <dict object at 0x0102fc49c0>
│└ # D2 [25 B]
├┬ D2: <dict object at 0x0102956a00>
│└ # D2 [2 B]
├┬ D2: <dict object at 0x0102fc4b80>
│├┬ D2: <dict object at 0x0102938ac0>
││└ # D2 [2 B]
│└ # D2 [23 B]# F1 [198 B]

This is the one that doesn't:

$ pytest tests/dill_test.py
┬ F2: <function foo at 0x104473be0># F2 [20 B]

So if pytest is involved, dill doesn't even try to pickle any of the function's attributes...?

@mmckerns
Copy link
Member

Essentially, yes. "F2" is passing the function off to pickle. The key is that there's an internal function called _locate_function, and if that returns False... probably in this case because _import_module does not find the module... then it punts to pickle which gives up.

@dionhaefner
Copy link
Author

Isn't it the other way around? According to https://github.com/uqfoundation/dill/blob/master/dill/_dill.py#L1881C12-L1881C12, dill uses the stock pickler when _locate_function returns True. But this is not what I want, since I want to dump the function object itself, not a reference to it.

@mmckerns
Copy link
Member

Yes, you are correct. I missed the not in the if statement.

@dionhaefner
Copy link
Author

Could you imagine having a flag similar to byref for modules that forces dill to pickle the function object instead of a reference to it? I think this would get us a lot closer to what we want to achieve.

@mmckerns
Copy link
Member

yes, there is a PR that is mostly done that handles a bunch of module serialization variants. work on it seems to have stalled a bit though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants