A Low-Energy Multi-Threaded Processor Design for Application Specific Embedded Systems

Mahanama Wickramasinghe,Hui Guo

doi:10.15344/2456-4451/2018/131

Abstract

Energy consumption is a critical issue in embedded systems design. A basic way for an embedded processor system to be energy efficient is to complete execution early and consume low power. Multi threaded processors interleave thread execution, reducing the processor’s idle time, hence the overall execution time. Caches moderate the long and power hungry external memory accesses, allowing for both performance improvement and power saving. However, when the two techniques are applied together, the efficiency of the design may not be as high as expected. The multi threaded execution can adversely interfere cache operations, increasing cache misses and leading to overall performance loss and large energy consumption. This paper presents a microarchitecture level design to enable the synergy of the two design techniques for embedded processors. Particularly we focus on a single pipeline processor with an instruction cache for applications that offer embarrassing parallelism. Such a design can be used as a building block processor for large computing systems. We propose a thread synchronization and cache locking scheme to allow cached instructions to be maximally reused by all threads. The experiments on a set of applications show that for the designs with 1 way cache and 300MB memory, an average of 26% baseline energy can be saved, and the energy savings become more significant when the memory size is increased.

Highlights

It is ideal for a pipelined processor to run at its full speed when executing an application
Since instruction cache misses account for a considerably higher performance impact than data cache misses [1] due to frequent instruction fetches, we focus this study on the instruction cache
We introduce a novel cache design with a cache locking scheme that provides a near 100% cache hit for synchronized thread executions and we apply the thread switching design introduced in [6] that parallelizes tasks in the thread scheduling and tucks-in execution switching into the processor pipeline to eliminate the thread switching performance overhead

Summary

Introduction

It is ideal for a pipelined processor to run at its full speed when executing an application. We target applications that offer embarrassing parallelism where the same code can be executed by a number of independent threads on different data sets Such applications can be found in real-world computing problems such as encryption [2], scientific calculation [3], multimedia processing [4] and image processing [5]. Our design is small, low cost, yet of high performance and energy efficiency

Related Work

Experiments and Results

Conclusion