Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudpickle incompatible with DataDog/dd-trace-py #506

Open
lukesturgis opened this issue May 23, 2023 · 1 comment
Open

Cloudpickle incompatible with DataDog/dd-trace-py #506

lukesturgis opened this issue May 23, 2023 · 1 comment

Comments

@lukesturgis
Copy link

Summary of problem

Cloudpickle does not work with versions of ddtrace >1.5.0 when trying to serialize objects by value. This previously worked with ddtrace <1.5.0, but they changed module discovery to use a module watchdog that does not use the standard sys.modules dictionary to detect when modules are loaded/unloaded and this does not seem to work with cloudpickle. I've tried this on the latest version of ddtrace (1.12.6) and cloudpickle (2.2.1).

It seems to take the following route when broken down: cloudpickle/cloudpickle_fast.py:632 attempts to dump the object, which goes to cpython/blob/3.11/Lib/pickle.py:476, which dumps the object then attempts to save it in cpython/blob/3.11/Lib/pickle.py#L535, which calls save_reduce in cpython/blob/3.11/Lib/pickle.py#L603, and the object is somehow malformed.

I've confirmed this only happens when ddtrace is imported before cloudpickle, but in practice that is not necessarily feasible. I opened a bug ticket with Datadog but this seems like it may be more applicable as a feature request for cloudpickle to support this.

Exception has occurred: PicklingError
args[0] from __newobj__ args has the wrong class
  File "/home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
  File "/home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/lukes/cloudpickle_test/python/modelingtests.py", line 24, in <module>
    x_after = pickle.dumps(to_pickle)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class
cloudpickle/cloudpickle_fast.py:632
            return Pickler.dump(self, obj)
https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L476
    def dump(self, obj):
        """Write a pickled representation of obj to the open file."""
        # Check whether Pickler was initialized correctly. This is
        # only needed to mimic the behavior of _pickle.Pickler.dump().
        if not hasattr(self, "_file_write"):
            raise PicklingError("Pickler.__init__() was not called by "
                                "%s.__init__()" % (self.__class__.__name__,))
        if self.proto >= 2:
            self.write(PROTO + pack("<B", self.proto))
        if self.proto >= 4:
            self.framer.start_framing()
        self.save(obj)
        self.write(STOP)
        self.framer.end_framing()
https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L535
    def save(self, obj, save_persistent_id=True):


https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L603
        self.save_reduce(obj=obj, *rv)


https://github.com/python/cpython/blob/3.11/Lib/pickle.py#L621
    def save_reduce(self, func, args, state=None, listitems=None,
                    dictitems=None, state_setter=None, *, obj=None):
        # This API is called by some subclasses

            cls, args, kwargs = args

Pip freeze

attrs==20.3.0 bytecode==0.13.0 cattrs==1.3.0 click==8.1.3 cloudpickle==2.2.1 coverage==6.3.1 ddsketch==2.0.4 ddtrace==1.12.6 Deprecated==1.2.13 envier==0.4.0 Flask==2.1.3 gunicorn==20.1.0 importlib-metadata==5.0.0 isort==5.10.1 itsdangerous==2.1.2 Jinja2==3.1.2 jsonschema==3.2.0 MarkupSafe==2.1.1 more-itertools==8.8.0 opentelemetry-api==1.17.0 packaging==20.9 protobuf==3.19.3 pyparsing==2.4.7 pyrsistent==0.15.5 six==1.14.0 tenacity==8.0.1 types-setuptools==57.4.17 typing_extensions==4.4.0 uWSGI==2.0.20 uwsgidecorators==1.1.0 Werkzeug==2.1.2 wrapt==1.11.2 xmltodict==0.13.0 zipp==0.6.0

Example

main.py

import cloudpickle as pickle
import ddtrace

import test

def to_pickle():
    return test.test()


x_before = pickle.dumps(to_pickle)
print(x_before)

pickle.register_pickle_by_value(test)
x_after = pickle.dumps(to_pickle)
print(x_after)

test.py

def test():
    return 1

Result

b'\x80\x05\x95\x19\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cM/home/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\tsubimport\x94\x93\x94h\n\x85\x94R\x94su\x86\x94\x86R0.'

Traceback (most recent call last):
  File "home/lukes/cloudpickle_test/python/modelingtests.py", line 24, in <module>
    x_after = pickle.dumps(to_pickle)
  File "home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "home/lukes/cloudpickle_test/python/cloudpickle/2/2/1/dist/lib/python3.9/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class

Expected result

b'\x80\x05\x95\x19\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeT
ype\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cM/home/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c
\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\
x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\tsubimport\x94\x93\x94h\n\x85\x94R\x94su\x86\x94\x86R0.'
b'\x80\x05\x95\xad\x04\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00\xa0\x00\xa1\x00S\x00\x94N\x85\x94\x8c\x04test\x94\x85\x94)\x8cMhome/lukes/cloudpickle_test/python/modelingtests.py\x94\x8c\tto_pickle\x94K\x10C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94\x8c\x08__file__\x94h\x0cuNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x17}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94h\nh\x00\x8c\x11dynamic_subimport\x94\x93\x94h\n}\x94(h\x13h\nh#Nh\x12\x8c\x00\x94\x8c\n__loader__\x94\x8c\x1a_frozen_importlib_external\x94\x8c\x10SourceFileLoader\x94\x93\x94)\x81\x94}\x94(\x8c\x04name\x94h\n\x8c\x04path\x94\x8cD/home/lukes/repos/DDG-311/ts/user/lukes/modelingtests/python/test.py\x94ub\x8c\x08__spec__\x94\x8c\x11_frozen_importlib\x94\x8c\nModuleSpec\x94\x93\x94)\x81\x94}\x94(h3h\n\x8c\x06loader\x94h1\x8c\x06origin\x94h5\x8c\x0cloader_state\x94N\x8c\x1asubmodule_search_locations\x94N\x8c\r_set_fileattr\x94\x88\x8c\x07_cached\x94\x8c\\home/lukes/cloudpickle_test/python/__pycache__/test.cpython-39.pyc\x94\x8c\r_initializing\x94\x89ubh\x15h5\x8c\n__cached__\x94hBh\nh\x02(h\x07(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94NK\x01\x86\x94))\x8cD/home/lukes/cloudpickle_test/python/test.py\x94h\nK\nC\x02\x00\x01\x94))t\x94R\x94}\x94(h\x12h,h\x13h\nh\x15h5uNNNt\x94R\x94h\x1ahM}\x94}\x94(h\x13h\nh\x1dh\nh\x1e}\x94h Nh!Nh"h\nh#Nh$Nh%]\x94h\'}\x94u\x86\x94\x86R0u\x86\x94R\x94su\x86\x94\x86R0.'

Please let me know if I can get over any more info. Thanks!

@ogrisel
Copy link
Contributor

ogrisel commented Jul 13, 2023

Thanks for the report.

I personally won't have time to dig into it nor attempt to work on supporting this myself, but I can help review a PR if someone work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants