New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dask predict #6412
Fix dask predict #6412
Conversation
Do you have some performance numbers? |
Yes. But please let me finish testing on GKE first. It's a bit cumbersome. |
Simple test, using HIGGS:
def dmain():
with LocalCUDACluster() as cluster:
print('Dashboard link:', cluster.dashboard_link)
with Client(cluster) as client:
dask_df = dask_cudf.read_csv(fname, header=None, names=colnames)
y = dask_df['label']
X = dask_df[dask_df.columns.difference(['label'])]
dtrain = dxgb.DaskDMatrix(client, X, y)
start = time()
output = dxgb.train(client, {'tree_method': 'gpu_hist'}, dtrain,
num_boost_round=10)
end = time()
print('Train::Duration', end - start)
start = time()
predictions = dxgb.predict(client, output, dtrain)
predictions = client.persist(predictions)
wait(predictions)
predictions.mean().compute()
end = time()
print('Predict::Duration', end - start)
return output['booster'], predictions
if __name__ == '__main__':
dmain() |
@trivialfis Can you post the perf number before and after this patch? I want to know if this patch improves performance. |
Before:
The diff should be more significant on platforms with more workers. Right now I'm just using 2 GPUs. |
I have finished testing on GKE. Could you please review? |
Codecov Report
@@ Coverage Diff @@
## master #6412 +/- ##
==========================================
- Coverage 79.94% 79.92% -0.03%
==========================================
Files 12 12
Lines 3476 3472 -4
==========================================
- Hits 2779 2775 -4
Misses 697 697
Continue to review full report at Codecov.
|
Close #6407 .