[GraphBolt] Update `ItemSampler` #7408

Skeleton003 · 2024-05-15T06:18:14Z

Description

Update ItemSampler to support correct stochastic sharding across distributed groups.
Modify the logic of ItemSet.__getitem__() when index is an iterable of int.

Benchmark: https://docs.google.com/document/d/1Pzk2PJoFtTZSu17wTXVK4mqvfrMLAj2xK6fcGC1pwEg/edit?usp=sharing

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-05-15T06:18:41Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

mfbalin · 2024-05-15T06:29:23Z

The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.

dgl-bot · 2024-05-15T06:50:38Z

Commit ID: 0134ddf

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Skeleton003 · 2024-05-15T07:39:45Z

The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.

Sure, could u please give me some guide on how to do that?

mfbalin · 2024-05-15T07:51:35Z

The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.

Sure, could u please give me some guide on how to do that?

The simplest way to monitor htop -d5 or nvtop to see what is the CPU utilization. If you wanted to be more precise, you could also insert timing code into ItemSampler to see how long each call takes. It may also make sense to write a regression benchmark so that we can monitor its runtime. We can have a sampling pipeline only containing the ItemSampler and benchmark it.

python/dgl/graphbolt/itemset.py

Rhett-Ying

code looks clean and code to me.

One more thing is please benchmark on larger dataset with larger batch-size such as ogbn-papers100M and heterogenous dataset, link prediction datasets(to measure the perf of indexing on tuple of tensors). Let's make sure it's performance efficient on most common datasets.

python/dgl/graphbolt/item_sampler.py

python/dgl/graphbolt/itemset.py

dgl-bot · 2024-05-16T05:44:56Z

Commit ID: 428d47a3b483c1e7f9b403124a56763afbac038b

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-05-16T08:04:15Z

Commit ID: fa3ccb337322150991a773951da7f74ed897e8a7

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot · 2024-05-17T04:20:29Z

Commit ID: bacdfb1c6088d51474c3581e84ef998da9d6185b

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Skeleton003 · 2024-05-20T05:23:42Z

@mfbalin Thank you for your valuable suggestions, but considering the scope of this PR, I'd like to defer them to another PR.

@Rhett-Ying If everything looks good to you, please approve so I can work on.

Rhett-Ying · 2024-05-21T00:59:55Z

@Skeleton003 According to the doc, examples/sampling/graphbolt/link_prediction.py runs a bit slower. Please keep an eye on the regression results.

dgl-bot · 2024-05-21T04:25:57Z

Commit ID: 1d69f6b7b4119f7cefb5801ba4ea01c94c573a4f

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-05-21T04:40:50Z

Commit ID: 4987175ffa9db59998db8797ced119ee91c8d946

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2024-05-21T05:11:57Z

Commit ID: a4b3f08b42d47b5105adaa6190a7a14593bc9ca6

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Ubuntu added 3 commits May 15, 2024 04:49

1

e97be00

use buffer

83f9e61

update docstring

0134ddf

Skeleton003 requested a review from Rhett-Ying May 15, 2024 06:18

mfbalin reviewed May 15, 2024

View reviewed changes

python/dgl/graphbolt/itemset.py Show resolved Hide resolved

Rhett-Ying reviewed May 16, 2024

View reviewed changes

python/dgl/graphbolt/item_sampler.py Show resolved Hide resolved

python/dgl/graphbolt/item_sampler.py Show resolved Hide resolved

python/dgl/graphbolt/item_sampler.py Show resolved Hide resolved

python/dgl/graphbolt/item_sampler.py Show resolved Hide resolved

Ubuntu added 3 commits May 16, 2024 04:52

add comment to clarify why self._epoch needed

e986fe2

wrap into method

0599d59

typo

7875aeb

mfbalin reviewed May 16, 2024

View reviewed changes

python/dgl/graphbolt/itemset.py Outdated Show resolved Hide resolved

mfbalin reviewed May 16, 2024

View reviewed changes

python/dgl/graphbolt/itemset.py Show resolved Hide resolved

rm unnecessary code

fb7fe73

tensor

fb1709c

Skeleton003 requested a review from Rhett-Ying May 20, 2024 16:11

Rhett-Ying approved these changes May 21, 2024

View reviewed changes

Ubuntu added 3 commits May 21, 2024 04:06

docstring

4c02630

rm buffer doc

91143be

reproducibility

53bb901

docstring for seed

d93291c

Skeleton003 merged commit 3574bff into dmlc:master May 21, 2024
2 checks passed

Skeleton003 mentioned this pull request Jun 3, 2024

[GraphBolt] Add ItemSet/Dict4 #7382

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt] Update `ItemSampler` #7408

[GraphBolt] Update `ItemSampler` #7408

Skeleton003 commented May 15, 2024 •

edited

dgl-bot commented May 15, 2024

mfbalin commented May 15, 2024

dgl-bot commented May 15, 2024

Skeleton003 commented May 15, 2024

mfbalin commented May 15, 2024 •

edited

Rhett-Ying left a comment

dgl-bot commented May 16, 2024

dgl-bot commented May 16, 2024

dgl-bot commented May 17, 2024

Skeleton003 commented May 20, 2024

Rhett-Ying commented May 21, 2024

dgl-bot commented May 21, 2024

dgl-bot commented May 21, 2024

dgl-bot commented May 21, 2024

[GraphBolt] Update ItemSampler #7408

[GraphBolt] Update ItemSampler #7408

Conversation

Skeleton003 commented May 15, 2024 • edited

Description

Checklist

Changes

dgl-bot commented May 15, 2024

mfbalin commented May 15, 2024

dgl-bot commented May 15, 2024

Skeleton003 commented May 15, 2024

mfbalin commented May 15, 2024 • edited

Rhett-Ying left a comment

Choose a reason for hiding this comment

dgl-bot commented May 16, 2024

dgl-bot commented May 16, 2024

dgl-bot commented May 17, 2024

Skeleton003 commented May 20, 2024

Rhett-Ying commented May 21, 2024

dgl-bot commented May 21, 2024

dgl-bot commented May 21, 2024

dgl-bot commented May 21, 2024

[GraphBolt] Update `ItemSampler` #7408

[GraphBolt] Update `ItemSampler` #7408

Skeleton003 commented May 15, 2024 •

edited

mfbalin commented May 15, 2024 •

edited