Exploiting Tightly-Coupled Cores

Daniel Bates,Robert Mullins,Andreas Koltes,Alex Bradbury

doi:10.1007/s11265-014-0944-6

Daniel Bates, Robert Mullins + Show 2 more

Open Access

https://doi.org/10.1007/s11265-014-0944-6

Copy DOI

Journal: Journal of Signal Processing Systems	Publication Date: Aug 26, 2014
Citations: 101	License type: CC BY 4.0

Affiliation: University of Cambridge

Abstract

The individual processors of a chip-multiprocessor traditionally have rigid boundaries. Inter-core communication is only possible via memory and control over a core's resources is localised. Specialisation necessary to meet today's challenging energy targets is typically provided through the provision of a range of processor types and accelerators. An alternative approach is to permit specialisation by tailoring the way a large number of homogeneous cores are used. The approach here is to relax processor boundaries, create a richer mix of inter-core communication mechanisms and provide finer-grain control over, and access to, the resources of each core. We evaluate one such design, called Loki, that aims to support specialisation in software on a homogeneous many-core architecture. We focus on the design of a single 8-core tile, conceived as the building block for a larger many-core system. We explore the tile's ability to support a range of parallelisation opportunities and detail the control and communication mechanisms needed to exploit each core's resources in a flexible manner. Performance and a detailed breakdown of energy usage is provided for a range of benchmarks and configurations.

Highlights

Current multi-core approaches provide a rigid target for the programmer and compiler
We explore some of the many parallel execution patterns possible when fast and efficient inter-core communication is available
Three case studies are performed into different types of parallelism, using subsets of the benchmarks which are able to make use of each

Summary

Introduction

Current multi-core approaches provide a rigid target for the programmer and compiler. This inflexibility and the predetermined partitioning of resources complicates the writing of parallel programs. Computation and communication are often controlled by hardware mechanisms, making it difficult to streamline the implementation of a particular program to overcome increasingly severe power constraints. Perhaps surprisingly, while such concerns persist, the architecture of most multi-core chips diverge little from older multi-node machines, even though the design space on-chip is far less constrained.

Objectives

Methods

Results

Conclusion