Abstract

The individual processors of a chip-multiprocessor traditionally have rigid boundaries. Inter-core communication is only possible via memory and control over a core's resources is localised. Specialisation necessary to meet today's challenging energy targets is typically provided through the provision of a range of processor types and accelerators. An alternative approach is to permit specialisation by tailoring the way a large number of homogeneous cores are used. The approach here is to relax processor boundaries, create a richer mix of inter-core communication mechanisms and provide finer-grain control over, and access to, the resources of each core. We evaluate one such design, called Loki, that aims to support specialisation in software on a homogeneous many-core architecture. We focus on the design of a single 8-core tile, conceived as the building block for a larger many-core system. We explore the tile's ability to support a range of parallelisation opportunities and detail the control and communication mechanisms needed to exploit each core's resources in a flexible manner. Performance and a detailed breakdown of energy usage is provided for a range of benchmarks and configurations.

Highlights

  • Current multi-core approaches provide a rigid target for the programmer and compiler

  • We explore some of the many parallel execution patterns possible when fast and efficient inter-core communication is available

  • Three case studies are performed into different types of parallelism, using subsets of the benchmarks which are able to make use of each

Read more

Summary

Introduction

Current multi-core approaches provide a rigid target for the programmer and compiler. This inflexibility and the predetermined partitioning of resources complicates the writing of parallel programs. Computation and communication are often controlled by hardware mechanisms, making it difficult to streamline the implementation of a particular program to overcome increasingly severe power constraints. Perhaps surprisingly, while such concerns persist, the architecture of most multi-core chips diverge little from older multi-node machines, even though the design space on-chip is far less constrained.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call