Effective On-Chip Communication for Message Passing Programs on Multi-Core Processors

Joonmoo Huh,Deokwoo Lee

doi:10.3390/electronics10212681

Abstract

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.

Highlights

Introduction published maps and institutional affilFrom programming perspectives, multicore is essentially synonymous with shared memory
We have studied the differences in the behaviors of message passing programs and shared memory programs
If messaging support is added to multicore architectures, the advantages of message passing can be exploited to create an efficient and high-performing alternative to shared memory programming

Summary

Introduction

Introduction published maps and institutional affilFrom programming perspectives, multicore is essentially synonymous with shared memory. The ubiquity of the assumption that shared memory equals multicore would lead one to conclude that message passing is not a reasonable choice for programming with these architectures. Few studies have rigorously investigated this assumption since the dawn of modern multi-core processors Message passing paradigms such as MPI (message passing interface) have been around for decades and provide great convenience for distributed systems because nodes are connected over networks, and MPI abstracts communication to simpler messages. In the case of multi-core processors, the message passing abstraction has not solved a critical need, at least not yet, and even causes a disturbance. One reason for this is that message passing iations

Methods

Results

Conclusion