Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobLibCollisionWarning: Cannot detect name collisions for function 'unknown' #1497

Open
opyate opened this issue Aug 11, 2023 · 1 comment
Open

Comments

@opyate
Copy link

opyate commented Aug 11, 2023

Hello, I get this when using partially-applied functions:

JobLibCollisionWarning: Cannot detect name collisions for function 'unknown'

Here's the broken code:

    from functools import partial
    from joblib import Memory

    mem = Memory(cache_dir, verbose=0)

    def foo(a: str, b: str):
        msg = f"a={a}, b={b}"
        print(f"foo called with {msg}")
        return msg
    
    foo_1 = partial(foo, b="one")
    foo_2 = partial(foo, b="two")

    foo_1_cached = mem.cache(foo_1, verbose=0)
    foo_2_cached = mem.cache(foo_2, verbose=0)


    foo_1_cached("hello")
    foo_1_cached("hello")
    foo_1_cached("hello")
    foo_2_cached("hello")
    foo_2_cached("hello")
    foo_1_cached("hello")
    foo_2_cached("hello")

Every time I flip from one function to the other, it re-evaluates the function, and prints that warning:

foo called with a=hello, b=one
/home/opyate/anaconda3/envs/notebook-error-py310/lib/python3.10/site-packages/joblib/memory.py:655: JobLibCollisionWarning: Cannot detect name collisions for function 'unknown'
return self._cached_call(args, kwargs)[0]
foo called with a=hello, b=two
foo called with a=hello, b=one
foo called with a=hello, b=two

Here's a workaround:

    from functools import partial
    from joblib import Memory

    mem = Memory(cache_dir, verbose=0)

    def foo(a: str, b: str):
        msg = f"a={a}, b={b}"
        print(f"foo called with {msg}")
        return msg
    
    foo_1 = partial(foo, b="one")
    foo_2 = partial(foo, b="two")

    def named_foo_1(a: str):
        return foo_1(a)
    
    def named_foo_2(a: str):
        return foo_2(a)

    foo_1_cached = mem.cache(named_foo_1, verbose=0)
    foo_2_cached = mem.cache(named_foo_2, verbose=0)


    foo_1_cached("hello")
    foo_1_cached("hello")
    foo_1_cached("hello")
    foo_2_cached("hello")
    foo_2_cached("hello")
    foo_1_cached("hello")
    foo_2_cached("hello")

However, it would still be great to be able to define my own name in the cache register, rather than relying on the fn's name. Is this possible?

My use-case: I'm calling out to an expensive API in a loop, so won't be able to defined named defs like in the work-around above.

E.g.

for job in ['a', 'b', 'c']:
    fnp = partial(fn, foo=bar)
    fnp_cached = mem.cache(fnp, verbose=0)
    fnp_cached(...)  # subsequent runs clears the cache_dir entries for 'unknown'

@opyate
Copy link
Author

opyate commented Aug 11, 2023

On second thought, I could redefine mem = Memory(cache_dir, verbose=0) with a unique cache dir in each iteration.

EDIT: it seems as if the subdirs of "{cache_dir}/unknown"gets cleared with every run, which defeats the purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant