Abstract

Energy consumption is a critical issue in embedded systems design. A basic way for an embedded processor system to be energy efficient is to complete execution early and consume low power. Multi threaded processors interleave thread execution, reducing the processor’s idle time, hence the overall execution time. Caches moderate the long and power hungry external memory accesses, allowing for both performance improvement and power saving. However, when the two techniques are applied together, the efficiency of the design may not be as high as expected. The multi threaded execution can adversely interfere cache operations, increasing cache misses and leading to overall performance loss and large energy consumption. This paper presents a microarchitecture level design to enable the synergy of the two design techniques for embedded processors. Particularly we focus on a single pipeline processor with an instruction cache for applications that offer embarrassing parallelism. Such a design can be used as a building block processor for large computing systems. We propose a thread synchronization and cache locking scheme to allow cached instructions to be maximally reused by all threads. The experiments on a set of applications show that for the designs with 1 way cache and 300MB memory, an average of 26% baseline energy can be saved, and the energy savings become more significant when the memory size is increased.

Highlights

  • It is ideal for a pipelined processor to run at its full speed when executing an application

  • Since instruction cache misses account for a considerably higher performance impact than data cache misses [1] due to frequent instruction fetches, we focus this study on the instruction cache

  • We introduce a novel cache design with a cache locking scheme that provides a near 100% cache hit for synchronized thread executions and we apply the thread switching design introduced in [6] that parallelizes tasks in the thread scheduling and tucks-in execution switching into the processor pipeline to eliminate the thread switching performance overhead

Read more

Summary

Introduction

It is ideal for a pipelined processor to run at its full speed when executing an application. We target applications that offer embarrassing parallelism where the same code can be executed by a number of independent threads on different data sets Such applications can be found in real-world computing problems such as encryption [2], scientific calculation [3], multimedia processing [4] and image processing [5]. Our design is small, low cost, yet of high performance and energy efficiency

Related Work
Experiments and Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call