Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching functions that return numpy.memmaps #1523

Open
tinducvo opened this issue Nov 17, 2023 · 0 comments
Open

Caching functions that return numpy.memmaps #1523

tinducvo opened this issue Nov 17, 2023 · 0 comments

Comments

@tinducvo
Copy link

Currently, joblib caching allows numpy arrays to be accessed as numpy.memmap objects.

However, these numpy arrays first need to be created. In create to work with larger-than-memory arrays, I tried writing and returning a numpy.memmap object from my function. This worked even after I delete my memmap file. However:

  1. I am not sure if joblib caching loads the entire array into memory anyway.
  2. My data is written once to my memmap file and again to the joblib cache, which reduces performance and wears down storage.
def memmap_return(length: int) -> numpy.memmap:
    experiments_path_directory = Path(".").resolve()
    memmap_file = numpy.memmap(
        experiments_path_directory / "memmap_file",
        dtype="float64",
        mode="w+",  # r: read, r+: read/write, w+: create/read/write
        shape=(length, length),
    )
    return memmap_file

Is there some clean way to allow joblib to just copy that memmap file to its cache or otherwise efficiently cache this type of function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant