Caching functions that return numpy.memmaps #1523

tinducvo · 2023-11-17T01:33:44Z

Currently, joblib caching allows numpy arrays to be accessed as numpy.memmap objects.

However, these numpy arrays first need to be created. In create to work with larger-than-memory arrays, I tried writing and returning a numpy.memmap object from my function. This worked even after I delete my memmap file. However:

I am not sure if joblib caching loads the entire array into memory anyway.
My data is written once to my memmap file and again to the joblib cache, which reduces performance and wears down storage.

def memmap_return(length: int) -> numpy.memmap:
    experiments_path_directory = Path(".").resolve()
    memmap_file = numpy.memmap(
        experiments_path_directory / "memmap_file",
        dtype="float64",
        mode="w+",  # r: read, r+: read/write, w+: create/read/write
        shape=(length, length),
    )
    return memmap_file

Is there some clean way to allow joblib to just copy that memmap file to its cache or otherwise efficiently cache this type of function?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching functions that return numpy.memmaps #1523

Caching functions that return numpy.memmaps #1523

tinducvo commented Nov 17, 2023

Caching functions that return numpy.memmaps #1523

Caching functions that return numpy.memmaps #1523

Comments

tinducvo commented Nov 17, 2023