Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

massive read only cache, missing something obvious? #1507

Open
qrdlgit opened this issue Sep 19, 2023 · 1 comment
Open

massive read only cache, missing something obvious? #1507

qrdlgit opened this issue Sep 19, 2023 · 1 comment

Comments

@qrdlgit
Copy link

qrdlgit commented Sep 19, 2023

I have workers which require access to a several GBs sized read only cache to do various things. When I left the cache as a global variable, joblib was very slow, so I started loading them from pickle on each spawn.

This improved performance dramatically, but it's still loading the data on each spawn!

Admittedly it's probably getting it from OS level cached io in memory (so mostly skipping disk reads), but still it has to unpickle and some overhead accessing io is there. Also, memory usage is multiplied across workers.

Is there a way to just directly access a shared read only object without the serialization/deserialization?

I thought this would be a first class use case, and maybe it is so obvious that it doesn't get great documentation.

I tried passing around a memory object, but that didn't work, and the documentation doesn't mention this as a use case.

@fcharras
Copy link
Contributor

fcharras commented Apr 3, 2024

Have you considered using native multiprocessing.shared_memory ?

I thought this would be a first class use case

Joblib is mostly targeted for simpler use cases of embarassingly parallels jobs and requirements such as shared resources divert away from this initial goal. Though we could consider more generic apis if this kind of features become more and more requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants