-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Sampling in cuGraph-PyG #4384
base: branch-24.06
Are you sure you want to change the base?
Conversation
…to bug_post_processing
…to bug_post_processing
…to bug_post_processing
…to bug_post_processing
…to bug_post_processing
…to bug_post_processing
…to bug_post_processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of this file was migrated to graph_sage_mg.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSR is the fastest compression based on benchmarking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MNMG example will be added along with the new WG feature store, and will use the new WG feature store to avoid replicating data across processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with some comments and questions. The overall logic looks reasonable and trust you with matching the detailed interface of PyG's native Graph- and FeatureStore.
sz = torch.tensor(num_vertices[vtype], device="cuda") | ||
torch.distributed.all_reduce(sz, op=torch.distributed.ReduceOp.MAX) | ||
num_vertices[vtype] = int(sz) | ||
return num_vertices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cache num_vertices
and num_edges
since they require collective communications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cache the offsets, which are more frequently used, but I might add caching the vertex and edge counts if they are used more often. So maybe in the future.
/merge |
CI failed with |
Distributed sampling in cuGraph-PyG. Also renames the existing API to clarify that it is dask based.
Adds a dependency on
tensordict
forcuGraph-PyG
which supports the newTensorDictFeatureStore
.Also no longer installs
torch-cluster
andtorch-spline-conv
in CI for testing since that results in anImportError
and neither of those packages are needed.Requires PyG 2.5. Should be merged after #4335
Merge after #4355
Closes #4248
Closes #4249
Closes #3383
Closes #3942
Closes #3836
Closes #4202
Closes #4051
Closes #4326
Closes #4252
Partially addresses #3805