Abstract

Atomic structure prediction and associated property calculations are the bedrock of chemical physics. Since high-fidelity ab initio modeling techniques for computing the structure and properties can be prohibitively expensive, this motivates the development of machine-learning (ML) models that make these predictions more efficiently. Training graph neural networks over large atomistic databases introduces unique computational challenges, such as the need to process millions of small graphs with variable size and support communication patterns that are distinct from learning over large graphs, such as social networks. We demonstrate a novel hardware-software codesign approach to scale up the training of atomistic graph neural networks (GNN) for structure and property prediction. First, to eliminate redundant computation and memory associated with alternative padding techniques and to improve throughput via minimizing communication, we formulate the effective coalescing of the batches of variable-size atomistic graphs as the bin packing problem and introduce a hardware-agnostic algorithm to pack these batches. In addition, we propose hardware-specific optimizations, including a planner and vectorization for the gather-scatter operations targeted for Graphcore's Intelligence Processing Unit (IPU), as well as model-specific optimizations such as merged communication collectives and optimized softplus. Putting these all together, we demonstrate the effectiveness of the proposed codesign approach by providing an implementation of a well-established atomistic GNN on the Graphcore IPUs. We evaluate the training performance on multiple atomistic graph databases with varying degrees of graph counts, sizes, and sparsity. We demonstrate that such a codesign approach can reduce the training time of atomistic GNNs and can improve their performance by up to 1.5× compared to the baseline implementation of the model on the IPUs. Additionally, we compare our IPU implementation with a Nvidia GPU-based implementation and show that our atomistic GNN implementation on the IPUs can run 1.8× faster on average compared to the execution time on the GPUs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.