Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators

Toshihiro Hanawa,Taisuke Boku,Yuetsu Kodama,Mitsuhisa Sato

doi:10.1109/ipdpsw.2013.226

Abstract

In recent years, heterogeneous clusters using accelerators have been widely used in high performance computing systems. In such clusters, inter-node communication among accelerators requires several memory copies via CPU memory, and the communication latency causes severe performance degradation. In order to address this problem, we propose the Tightly Coupled Accelerators (TCA) architecture to reduce the communication latency between accelerators over different nodes. In addition, we promote the HA-PACS project at the Center for Computational Sciences, University of Tsukuba, in order to build up the HA-PACS base cluster system, as a commodity GPU cluster, and to develop an experimental system based on the TCA architecture as a proprietary interconnection network connecting accelerators beyond the nodes. In the present paper, we describe the TCA architecture and the design and implementation of PEACH2 for realizing the TCA architecture. We also evaluate the functionality and the basic performance of the PEACH2 chip, and the results demonstrate that the PEACH2 chip has sufficient maximum performance with 93% of the theoretical peak performance and a latency between adjacent nodes of approximately 0.8μsec.

Full Text