The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

Igor Loi,Alessandro Capotondi,Luca Benini,Andrea Marongiu,Davide Rossi

doi:10.1109/tmscs.2017.2769046

Igor Loi, Alessandro Capotondi + Show 3 more

Open Access

https://doi.org/10.1109/tmscs.2017.2769046

Copy DOI

Abstract

High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy $\times$ area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy $\times$ area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.

Highlights

The Internet of Things (IoT) [1] is becoming pervasive in our everyday life and it is expected to have an increasingly higher impact in the coming decades
Results show that the proposed multi-port cache architecture can improve the performance with respect to the private cache by up to 40% in throughput, 20% in energy efficiency, and 30% in energy × area efficiency for sizes of instruction caches of few kB
We explored instruction cache architectures for energy efficient and cost effective tightly coupled clusters of processors for end-node IoT devices

Summary

INTRODUCTION

The Internet of Things (IoT) [1] is becoming pervasive in our everyday life and it is expected to have an increasingly higher impact in the coming decades. While PVT variations can be effectively managed in the digital cores by exploiting robust standard cell libraries or postfabrication compensation techniques, the supply voltage of standard 6T SRAMs has to be kept at a higher value with respect to the logic causing on-chip memory to form a major bottleneck for energy efficiency of Ultra-Low-Power (ULP) designs [10]. Results show that the proposed multi-port cache architecture can improve the performance with respect to the private cache by up to 40% in throughput, 20% in energy efficiency, and 30% in energy × area efficiency for sizes of instruction caches of few kB (typical of the low-power microcontrollers used in end-node IoT devices).

Instruction Memory Hierarchy of ULP SoCs

Improving Energy Efficiency of Instruction Fetch Subsystem

Exploiting Shared Instruction Cache in TightlyCoupled Clusters

ARCHITECTURE

SoC Architecture

Private Instruction Cache

Shared Instruction Cache

Multi-port Instruction Cache

RESULTS

Experimental Setup

Implementation results

Benchmarking

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Multi-Scale Computing Systems	Publication Date: Apr 1, 2018
Citations: 42	License type: cc-by

R Discovery Prime

R Discovery Prime

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Multi-Scale Computing Systems

Lead the way for us

Similar Papers

Energy-Efficient Two-level Instruction Cache Design for an Ultra-Low-Power Multi-core Cluster
Chen Jie ... Igor Loi
-
Chen Jie, et. al.Chen Jie ... Igor Loi
01 Mar 2020
01 Mar 2020

Static task partitioning for locked caches in multi-core real-time systems
Abhik Sarkar ... Harini Ramaprasad
-
Abhik Sarkar, et. al.Abhik Sarkar ... Harini Ramaprasad
07 Oct 2012
07 Oct 2012

Static Task Partitioning for Locked Caches in Multicore Real-Time Systems
Abhik Sarkar ... Harini Ramaprasad
ACM Transactions in Embedded Computing Systems | VOL. 14
Abhik Sarkar, et. al.Abhik Sarkar ... Harini Ramaprasad
21 Jan 2015
ACM Transactions in Embedded Computing Systems | VOL. 14

A 60 GOPS/W, −1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology
Davide Rossi ... Luca Benini
Solid State Electronics | VOL. 117
Davide Rossi, et. al.Davide Rossi ... Luca Benini
26 Nov 2015
A 60 GOPS/W, −1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology
Davide Rossi ... Luca Benini

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Multi-Scale Computing Systems