You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If instead the two files are combined, then changes to the function are detected correctly. But it's expected when working on any realistic codebase that things will be modularized into separate files.
The text was updated successfully, but these errors were encountered:
This issue is a duplicate of #3297. This is a limitation of dill, a package we use for caching (non-__main__ module objects are serialized by reference). You can find more info about it here: uqfoundation/dill#424.
In your case, moving
data = datasets.load_dataset('json', data_files=['/tmp/test.json'], split='train')
data = data.map(transform)
to test.py and setting transform.__module__ = None at the end of dataset.py should fix the issue.
I understand this may be a limitation of an upstream tool, but for a user for datasets this is very annoying, as when you have dozens of different datasets with different preprocessing functions you can't really move them all into the same file. It may be worth seeing if there is a way to specialize the dependency (eg. subclass it) and enforce behaviors that makes sense for your product.
I was able to work around this for now by setting __module__ = None. If such workarounds are required for now it may be better to document it somewhere than a single obscure issue from a long time ago.
As this is a duplicate issue I'm closing it.
I have another issue with the cache #6179 can you take a look?
Initialize cache
Edit dataset.py and uncomment the commented line, run again
Clear cache and run again
If instead the two files are combined, then changes to the function are detected correctly. But it's expected when working on any realistic codebase that things will be modularized into separate files.
The text was updated successfully, but these errors were encountered: