[pyspark] clear temp storage to release GPU memory before each iteration #10226

wbo4958 · 2024-04-26T03:56:16Z

When setting the current data batch (already copied to GPU from CPU) to the DMatrixProxy, it still holds the previous data batch during building QDM, which will increase the peak GPU memory usage. So this PR tried to clear the temporary storage before each iteration.

I tested it on my local standalone cluster with 1 worker node with 12G GPU memory, XGBoost can train (4800000, 750) with this PR, while XGBoost can train only (4300000, 750) without this PR.

trivialfis · 2024-04-26T08:26:56Z

Thank you for working on this. Is it possible to make this change universal instead of spark-only? The temporary data is a reference to transformed features like encoded categories or converted data types, it's to make sure they don't get garbage collected before the next iteration.

[pyspark] clear temp storage to release GPU memory before each iteration

89cdb19

Merge branch 'master' into fix-peak-mem

055b7c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyspark] clear temp storage to release GPU memory before each iteration #10226

[pyspark] clear temp storage to release GPU memory before each iteration #10226

wbo4958 commented Apr 26, 2024 •

edited

trivialfis commented Apr 26, 2024

[pyspark] clear temp storage to release GPU memory before each iteration #10226

Are you sure you want to change the base?

[pyspark] clear temp storage to release GPU memory before each iteration #10226

Conversation

wbo4958 commented Apr 26, 2024 • edited

trivialfis commented Apr 26, 2024

wbo4958 commented Apr 26, 2024 •

edited