Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__init__ and __call__ methods are not cloudpickled #464

Open
kocabiyikalper opened this issue Mar 2, 2022 · 1 comment
Open

__init__ and __call__ methods are not cloudpickled #464

kocabiyikalper opened this issue Mar 2, 2022 · 1 comment

Comments

@kocabiyikalper
Copy link

kocabiyikalper commented Mar 2, 2022

Overwriting the logic inside __call__ method of the wrapper:

import cloudpickle
import wrapt
import pandas as pd

def _num_of_rows_wrapper() -> Any:
    class FunctionWrapper(wrapt.FunctionWrapper):
        def __call__(self, *args: Any, **kwargs: Any) -> Any:
            dataframe = super(FunctionWrapper, self).__call__(*args, **kwargs)
            num_of_rows = len(dataframe.index)
            print(f"Num of rows: {num_of_rows}")

            return dataframe

    return FunctionWrapper

def num_of_rows():
    @wrapt.decorator(proxy=_num_of_rows_wrapper())
    def wrapper(wrapped, instance, args, kwargs) -> None:
        return wrapped(*args, **kwargs)

    return wrapper    

@num_of_rows()
def prepare_data():
    data = [["id1", 10], ["id2", 15], ["id3", 14]]
    pandas_df = pd.DataFrame(data, columns=["id", "value"])
    return pandas_df

Running the decorated function works as expected (num of rows printed):

prepare_data()

Once it is cloudpickled and loaded, __call__ gets lost (num of rows is not printed)

dump = cloudpickle.dumps(prepare_data)
loaded = cloudpickle.loads(dump)
loaded()

Same happens for the __init__ . Any suggestion for keeping those functions with cloudpickle?

@pierreglaser
Copy link
Member

Hey, thanks for the reproducer. The reason is some bad interaction between wrapt and cloudpickle. cloudpickle usually pickles all methods of a dynamically defined class, including __call__ and __init__ .

But the instance checking logic of wrapt w.r.t its FunctionWrapper objects (see https://github.com/GrahamDumpleton/wrapt/blob/8f180bf981fc7a92094cfecfd7a9e5f591d4bd4b/src/wrapt/_wrappers.c#L2547-L2571) which is essentially

isintance(function_wrapper_object, type_of_wrapped_object) == True

is severely confusing cloudpickle, and makes cloudpickle treat prepare_data as a function, whereas it should treat is as a instance of FunctionWrapper.

I'm not sure whether there is a path forward yet. I believe wrapt could at least include some __reduce__ method for such objects, but such an isinstance logic will still create a false positive during the reducer_override callback invocation of cloudpickle, which looks for function objects that are pickled in a custom manner prior to relying on __reduce__-based pickling logic for the object in question. So ideally, wrapt would reconsider their instance checking semantics...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants