Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Demirci, Gunduz Vehbi; Haldar, Aparajita; Ferhatosmanoglu, Hakan

Computer Science > Machine Learning

arXiv:2212.05009 (cs)

[Submitted on 9 Dec 2022 (v1), last revised 13 Dec 2022 (this version, v2)]

Title:Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Authors:Gunduz Vehbi Demirci, Aparajita Haldar, Hakan Ferhatosmanoglu

View PDF

Abstract:Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts. In our solution, the large adjacency and vertex-feature matrices are partitioned among processors. We exploit the vertex-partitioning of the graph to use non-blocking point-to-point communication operations between processors for better scalability. To further minimize the parallelization overheads, we introduce a sparse matrix partitioning scheme based on a hypergraph partitioning model for full-batch training. We also propose a novel stochastic hypergraph model to encode the expected communication volume in mini-batch training. We show the merits of the hypergraph model, previously unexplored for GCN training, over the standard graph partitioning model which does not accurately encode the communication costs. Experiments performed on real-world graph datasets demonstrate that the proposed algorithms achieve considerable speedups over alternative solutions. The optimizations achieved on communication costs become even more pronounced at high scalability with many processors. The performance benefits are preserved in deeper GCNs having more layers as well as on billion-scale graphs.

Comments:	To appear in PVLDB'22
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2212.05009 [cs.LG]
	(or arXiv:2212.05009v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.05009

Submission history

From: Aparajita Haldar [view email]
[v1] Fri, 9 Dec 2022 17:51:13 UTC (1,624 KB)
[v2] Tue, 13 Dec 2022 12:21:25 UTC (1,624 KB)

Computer Science > Machine Learning

Title:Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators