Triple Engine Processor (TEP)

Hongyeol Lim,Giho Park

doi:10.1145/3155920

Abstract

The advent of 3D memory stacking technology, which integrates a logic layer and stacked memories, is expected to be one of the most promising memory technologies to mitigate the memory wall problem by leveraging the concept of near-memory processing (NMP). With the ability to process data locally within the logic layer of stacked memory, a variety of emerging big data applications can achieve significant performance and energy-efficiency benefits. Various approaches to the NMP logic layer architecture have been studied to utilize the advantage of stacked memory. While significant acceleration of specific kernel operations has been derived from previous NMP studies, an NMP-based system using an NMP logic architecture capable of handling some specific kernel operations can suffer from performance and energy efficiency degradation caused by a significant communication overhead between the host processor and NMP stack. In this article, we first analyze the kernel operations that can greatly improve the performance of NMP-based systems in diverse emerging applications, and then we analyze the architecture to efficiently process the extracted kernel operations. This analysis confirms that three categories of processing engines for NMP logic are required for efficient processing of a variety of emerging applications, and thus we propose a Triple Engine Processor (TEP), a heterogeneous near-memory processor with three types of computing engines. These three types of engines are an in-order core, a coerce-grain reconfigurable processor (CGRA), and dedicated hardware. The proposed TEP provides about 3.4 times higher performance and 33% greater energy savings than the baseline 3D memory system.

Full Text