Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyG Remote Backend Based on GraphScope #3739

Open
LiSu opened this issue Apr 23, 2024 · 1 comment
Open

PyG Remote Backend Based on GraphScope #3739

LiSu opened this issue Apr 23, 2024 · 1 comment
Assignees

Comments

@LiSu
Copy link
Collaborator

LiSu commented Apr 23, 2024

GraphScope leverages the distributed GNN training framework, graphlearn-for-pytorch (GLTorch), to facilitate large-scale distributed GNN training. GLTorch is model-layer compatible with PyG and enables the extension of PyG-based GNN training to large distributed graphs.

To address the challenge of training GNNs on graphs that exceed the available memory of a single machine, PyG has introduced a pluggable Remote Backend mechanism. This mechanism, through abstractions like FeatureStore and GraphStore, supports integration with third-party graph storage engines. The FeatureStore permits utilization of node/edge features stored remotely, while the GraphStore facilitates access to graph structure information held externally. This project aims to implement a PyG Remote Backend based on GraphScope for PyG to provide a user-friendly experience for conducting distributed GNN training with GraphScope for PyG users.

Deliverables:

  • Implement the PyG FeatureStore and GraphStore abstractions within GraphScope
  • Complete the end-to-end integration of GraphScope and PyG via the Remote Backend
@LiSu
Copy link
Collaborator Author

LiSu commented Apr 23, 2024

GraphScope基于分布式GNN训练框架graphlearn-for-pytorch (GLTorch)支持大规模分布式GNN训练。GLTorch在模型层和PyG兼容,支持将PyG GNN训练扩展到分布式大图。为了支持在大于机器可用内存大小的图上训练GNN,PyG引入了一套可插拔的Remote Backend机制,即通过FeatureStore 和 GraphStore等抽象,支持第三方图存储引擎和PyG的对接。其中FeatureStore允许用户利用存储在远程的节点/边特征,GraphStore允许用户利用存储在远程的图结构信息,两者结合支持基于远端存储的GNN训练扩展。本项目旨在通过实现基于Graphscope的PyG Remote Backend,更进一步简化GraphScope和PyG的对接方式,提供对PyG用户友好的基于GraphScope进行分布式GNN训练的产品使用体验。

产出:

  • 基于GLTorch当前架构,设计FeatureStore 和 GraphStore在GraphScope上的实现方案
  • 完成Remote Backend整体实现,在GraphScope上提供基于PyG Remote Backend的分布式训练示例

难度: 初级
技术要求:熟练使用Python语言,熟悉C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants