Recent years have been characterized by increasing interest in graph computations. This trend can be related to the large number of potential application areas. Moreover, increasing computational capabilities of modern computers allowed turning theory of graph algorithms into explorations of best methods for their actual realization. These factors, in turn, brought about ideas like creation of a hardware component dedicated to graph computation; i.e., the Graphcore Intelligent Processor Unit (IPU). Interestingly, Graphcore systems are a hardware implementation of the Bulk Synchronous Parallel paradigm, which seemed to be a mostly theoretical concept from the end of last century. In this context, the question that has to be addressed experimentally is as follows: how good are Graphcore systems in comparison with standard systems that can be used to run graph algorithms, i.e., CPUs and GPUs. To provide a partial response to this broad question, in this contribution, PageRank, Single Source Shortest Path and Breadth-First Search algorithms are used to compare the performance of IPU-deployed algorithms to other parallel architectures. Obtained results clearly show that the Graphcore IPU outperforms other devices for the studied heterogeneous algorithms and, currently, provides best-in-class execution time results for a range of graph sizes and densities.
Read full abstract