Joblib Using Dask Backend 2x Slower + Not Able to Run using Processess = True (Error in Pastebin) #7165
Unanswered
omarsumadi
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Everyone,
I have been working at looking at the various potential ways to using Dask to parallelize in a cluster a function using Joblib without Dask as the explicit Backend versus using 'Dask' as the Joblib backend as setting Dask as the Backend is what's used in the Docs, but I wanted to see what I could do.
I have been experimenting with Scikit-learn as a way to bring up this discussion without bringing up packages in which there isn't a clear example in the Docs (see: https://ml.dask.org/joblib.html).
Here's the client setup:
I have some implementations to show on the various approaches I've been taking:
Approach 1: Using Submit with Either Threading or Processes on Scikit-Learn
Average Time of ~3-4 Seconds for both Process = False or True versus ~3 Seconds with Scikit Defaults without Dask (so just running the function).
Approach 1: Using Joblib with Dask as the Backend (~6.5-7.5 Seconds + Only Usable with Threads)
For some reason, if you set processes = True to force Dask to use processes, this will not work and give you an event loop not started error. Make sure you restart your Kernal to test.
I wanted to stick with this implementation (trying to test if processes=True will work because this seems like the correct, native approach. Does anyone know how we can run these functions with processes = True some way or why this is failing for me for any particular reason?
Here's the error trace for starting the client with Processes = True:
https://pastebin.com/8yP1dstK
Thanks,
Omar
Beta Was this translation helpful? Give feedback.
All reactions