You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue: The functions that I want to execute with the ClientExecutor are dynamically imported via the module's path.
Even if I register the dynamically imported module with cloudpickle.register_pickle_by_value, the deserialization fails.
Minimal Complete Verifiable Example:
First, the main file contains the code to import modules dynamically and then submits the imported functions to the executor. For experimentation, the executor from loky was also tested, which did not throw an error.
# Content of main.pyimportsysimportimportlib.utilimportcloudpicklefrompathlibimportPathfromtypesimportModuleTypeimportcloudpicklefromlokyimportget_reusable_executorfromdistributedimportClient, LocalClusterdefimport_path(path: Path) ->ModuleType:
"""Adapted from https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly."""module_name=path.namespec=importlib.util.spec_from_file_location(module_name, str(path))
ifspecisNone:
raiseImportError(f"Can't find module {module_name!r} at location {path}.")
mod=importlib.util.module_from_spec(spec)
sys.modules[module_name] =modspec.loader.exec_module(mod)
returnmodif__name__=="__main__":
# Import the module.module=import_path(Path("functions.py").resolve())
# Register the module for pickling.cloudpickle.register_pickle_by_value(module)
# with get_reusable_executor(max_workers=1) as executor:# future = executor.submit(module.func)client=Client(LocalCluster(n_workers=1))
withclient.get_executor() asexecutor:
future=executor.submit(module.func)
print(future.result())
Second, a module functions.py that holds the dynamically imported function.
deffunc(): return"SUCCESS"
Running the code yields
Console
❯ python main.py 2024-04-03 00:07:17,237 - distributed.protocol.core - CRITICAL - Failed to deserializeTraceback (most recent call last): File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/core.py", line 175, in loads return msgpack.loads( ^^^^^^^^^^^^^^ File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/core.py", line 172, in _decode_default return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 96, in loads return pickle.loads(x) ^^^^^^^^^^^^^^^ModuleNotFoundError: No module named 'functions.py'; 'functions' is not a package2024-04-03 00:07:17,345 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f006efefa10>>, <Task finished name='Task-4' coro=<Worker.handle_scheduler() done, defined at /home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/worker.py:203> exception=ModuleNotFoundError("No module named 'functions.py'; 'functions' is not a package")>)Traceback (most recent call last): File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/tornado/ioloop.py", line 750, in _run_callback ret = callback() ^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/tornado/ioloop.py", line 774, in _discard_future_result future.result() File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/worker.py", line 206, in wrapper return await method(self, *args, **kwargs) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/worker.py", line 1302, in handle_scheduler await self.handle_stream(comm) File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/core.py", line 1025, in handle_stream msgs = await comm.read() ^^^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/comm/tcp.py", line 247, in read msg = await from_frames( ^^^^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/comm/utils.py", line 78, in from_frames res = _from_frames() ^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/comm/utils.py", line 61, in _from_frames return protocol.loads( ^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/core.py", line 175, in loads return msgpack.loads( ^^^^^^^^^^^^^^ File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/core.py", line 172, in _decode_default return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tobia/micromamba/envs/pytask-parallel/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 96, in loads return pickle.loads(x) ^^^^^^^^^^^^^^^ModuleNotFoundError: No module named 'functions.py'; 'functions' is not a package
Anything else we need to know?:
Everything works if I change the serialization in this line from pickle to cloudpickle.
Maybe the logic can be adjusted such that if a module shows up in cloudpickle.list_registry_pickle_by_value(), the user meant to pickle it by value.
The issue also touches on #7841. If the user in the issue had used --import-mode importlib as the import mode for pytest, the same issue appeared. pytest basically uses import_path under the hood with some adjustments.
Environment:
Dask version: 2024.3.1
Python version: 3.11.8
Operating System: WSL
Install method (conda, pip, source): conda
The text was updated successfully, but these errors were encountered:
Describe the issue: The functions that I want to execute with the
ClientExecutor
are dynamically imported via the module's path.Even if I register the dynamically imported module with
cloudpickle.register_pickle_by_value
, the deserialization fails.Minimal Complete Verifiable Example:
First, the main file contains the code to import modules dynamically and then submits the imported functions to the executor. For experimentation, the executor from loky was also tested, which did not throw an error.
Second, a module
functions.py
that holds the dynamically imported function.Running the code yields
Console
Anything else we need to know?:
Everything works if I change the serialization in this line from pickle to cloudpickle.
distributed/distributed/protocol/pickle.py
Line 63 in 5647d06
Maybe the logic can be adjusted such that if a module shows up in
cloudpickle.list_registry_pickle_by_value()
, the user meant to pickle it by value.The issue also touches on #7841. If the user in the issue had used
--import-mode importlib
as the import mode for pytest, the same issue appeared. pytest basically usesimport_path
under the hood with some adjustments.Environment:
The text was updated successfully, but these errors were encountered: