Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression from 0.3.5.1 to 0.3.6 - serialization of static methods in locally imported classes #572

Open
eladsegal opened this issue Jan 30, 2023 · 2 comments

Comments

@eladsegal
Copy link

eladsegal commented Jan 30, 2023

Description

In 0.3.5.1, it was possible to serialize static methods of imported classes.
In 0.3.6, that is no longer the case and the serialization is done only for the reference.

Steps to reproduce

Create files main.py and other_file.py:

# main.py
from dill import dumps
from other_file import SomeClass

print(dumps(SomeClass.some_func))
# other_file.py
class SomeClass:
    @staticmethod
    def some_func():
        number = 1
        print(number)
  1. Run python main.py.
  2. Change the number variable in other_file.py (for example to number = 2).
  3. Run python main.py again

With dill==0.3.5.1, the output is different between the two runs, while in dill==0.3.6 the output is the same.

@eladsegal eladsegal changed the title Regression from 0.3.5 to 0.3.6 - Lost capability to serialize static methods from imported classes Regression from 0.3.5.1 to 0.3.6 - Lost capability to serialize static methods from imported classes Jan 30, 2023
@mmckerns
Copy link
Member

Confirmed. dill-0.3.6 stores the locally imported class by reference (regardless of dill.settings).

@mmckerns mmckerns changed the title Regression from 0.3.5.1 to 0.3.6 - Lost capability to serialize static methods from imported classes Regression from 0.3.5.1 to 0.3.6 - serialization of static methods in locally imported classes Jan 30, 2023
@anivegesana
Copy link
Contributor

anivegesana commented Feb 20, 2023

@mmckerns I think that the new functionality is correct and the old result was wrong. I think in 0.3.6, one of the PRs that I opened changed function lookup from using __name__ to using __qualname__ (#486), so now dill can look up static methods and not pickle them if they are not in __main__. I think that what @eladsegal is requesting is something similar to #424. Maybe a followup on that issue would be desired? I believe the problem can be solved with @leogama's session module.

I guess for this particular case (huggingface/datasets#3297), you can set mapping_func.__module__ = None to force dill to pickle the function by reference instead of by name (since the name is no longer valid), but that is just a hack until there is a solution to #424. A more verbose, but probably identical, hack that wouldn't cause as much trouble is wrapping mapping_func in a function:

def mapping_func():
  def mapping_func_inst(examples):
    ID_LENGTH = 4
    examples["id"] = [id_[:ID_LENGTH] for id_ in examples["id"]]
    return examples
  return mapping_func_inst

squad.map(mapping_func(), batched=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants