You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have workers which require access to a several GBs sized read only cache to do various things. When I left the cache as a global variable, joblib was very slow, so I started loading them from pickle on each spawn.
This improved performance dramatically, but it's still loading the data on each spawn!
Admittedly it's probably getting it from OS level cached io in memory (so mostly skipping disk reads), but still it has to unpickle and some overhead accessing io is there. Also, memory usage is multiplied across workers.
Is there a way to just directly access a shared read only object without the serialization/deserialization?
I thought this would be a first class use case, and maybe it is so obvious that it doesn't get great documentation.
I tried passing around a memory object, but that didn't work, and the documentation doesn't mention this as a use case.
The text was updated successfully, but these errors were encountered:
Joblib is mostly targeted for simpler use cases of embarassingly parallels jobs and requirements such as shared resources divert away from this initial goal. Though we could consider more generic apis if this kind of features become more and more requested.
I have workers which require access to a several GBs sized read only cache to do various things. When I left the cache as a global variable, joblib was very slow, so I started loading them from pickle on each spawn.
This improved performance dramatically, but it's still loading the data on each spawn!
Admittedly it's probably getting it from OS level cached io in memory (so mostly skipping disk reads), but still it has to unpickle and some overhead accessing io is there. Also, memory usage is multiplied across workers.
Is there a way to just directly access a shared read only object without the serialization/deserialization?
I thought this would be a first class use case, and maybe it is so obvious that it doesn't get great documentation.
I tried passing around a memory object, but that didn't work, and the documentation doesn't mention this as a use case.
The text was updated successfully, but these errors were encountered: