Fix dask predict #6412

trivialfis · 2020-11-19T21:50:46Z

Close #6407 .

hcho3 · 2020-11-19T23:02:25Z

Do you have some performance numbers?

trivialfis · 2020-11-19T23:07:40Z

Yes. But please let me finish testing on GKE first. It's a bit cumbersome.

trivialfis · 2020-11-19T23:38:16Z

Simple test, using HIGGS:

Train::Duration 5.479685544967651
0.5290751
Predict::Duration 0.9758050441741943

def dmain():
    with LocalCUDACluster() as cluster:
        print('Dashboard link:', cluster.dashboard_link)
        with Client(cluster) as client:
            dask_df = dask_cudf.read_csv(fname, header=None, names=colnames)
            y = dask_df['label']
            X = dask_df[dask_df.columns.difference(['label'])]
            dtrain = dxgb.DaskDMatrix(client, X, y)
            start = time()
            output = dxgb.train(client, {'tree_method': 'gpu_hist'}, dtrain,
                                num_boost_round=10)
            end = time()
            print('Train::Duration', end - start)

            start = time()
            predictions = dxgb.predict(client, output, dtrain)
            predictions = client.persist(predictions)
            wait(predictions)
            predictions.mean().compute()
            end = time()

            print('Predict::Duration', end - start)
            return output['booster'], predictions


if __name__ == '__main__':
    dmain()

hcho3 · 2020-11-19T23:39:48Z

@trivialfis Can you post the perf number before and after this patch? I want to know if this patch improves performance.

trivialfis · 2020-11-19T23:43:38Z

Before:

Train::Duration 5.479079008102417
Predict::Duration 1.6077783107757568

The diff should be more significant on platforms with more workers. Right now I'm just using 2 GPUs.

trivialfis · 2020-11-19T23:46:16Z

I have finished testing on GKE. Could you please review?

codecov-io · 2020-11-19T23:51:40Z

Codecov Report

Merging #6412 (c53479a) into master (c763b50) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6412      +/-   ##
==========================================
- Coverage   79.94%   79.92%   -0.03%     
==========================================
  Files          12       12              
  Lines        3476     3472       -4     
==========================================
- Hits         2779     2775       -4     
  Misses        697      697

Impacted Files	Coverage Δ
python-package/xgboost/dask.py	`81.00% <100.00%> (-0.15%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c763b50...c53479a. Read the comment docs.

trivialfis added 2 commits November 20, 2020 05:37

Fix prediction.

238fa7c

cleanup.

c53479a

hcho3 added the Blocking label Nov 19, 2020

RAMitchell approved these changes Nov 20, 2020

View reviewed changes

trivialfis merged commit a7b42ad into dmlc:master Nov 20, 2020

trivialfis deleted the fix-dask-predict branch November 20, 2020 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dask predict #6412

Fix dask predict #6412

trivialfis commented Nov 19, 2020

hcho3 commented Nov 19, 2020

trivialfis commented Nov 19, 2020

trivialfis commented Nov 19, 2020

hcho3 commented Nov 19, 2020

trivialfis commented Nov 19, 2020

trivialfis commented Nov 19, 2020 •

edited

codecov-io commented Nov 19, 2020 •

edited

Fix dask predict #6412

Fix dask predict #6412

Conversation

trivialfis commented Nov 19, 2020

hcho3 commented Nov 19, 2020

trivialfis commented Nov 19, 2020

trivialfis commented Nov 19, 2020

hcho3 commented Nov 19, 2020

trivialfis commented Nov 19, 2020

trivialfis commented Nov 19, 2020 • edited

codecov-io commented Nov 19, 2020 • edited

Codecov Report

trivialfis commented Nov 19, 2020 •

edited

codecov-io commented Nov 19, 2020 •

edited