Abstract

Aggressive scaling in deep nanometer technology enables chip multiprocessor design facilitated by the communication-centric architecture provided by Network-on-Chip (NoC). At the same time, it brings considerable challenges in reliability because a fault in the network architecture severely impacts the performance of a system. To deal with these reliability challenges, this research proposed NoCGuard, a reconfigurable architecture designed to tolerate multiple permanent faults in each pipeline stage of the generic router. NoCGuard router architecture uses four highly reliable and low-cost fault-tolerant strategies. We exploited resource borrowing and double routing strategy for the routing computation stage, default winner strategy for the virtual channel allocation stage, runtime arbiter selection and default winner strategy for the switch allocation stage and multiple secondary bypass paths strategy for the crossbar stage. Unlike existing reliable router architectures, our architecture features less redundancy, more fault tolerance, and high reliability. Reliability comparison using Mean Time to Failure (MTTF) metric shows 5.53-time improvement in a lifetime and using Silicon Protection Factor (SPF), 22-time improvement, which is better than state-of-the-art reliable router architectures. Synthesis results using 15 nm and 45 nm technology library show that additional circuitry incurs an area overhead of 28.7% and 28% respectively. Latency analysis using synthetic, PARSEC and SPLASH-2 traffic shows minor increase in performance by 3.41%, 12% and 15% respectively while providing high reliability.

Highlights

  • A conventional way to increase chip performance is to improve its operational frequency.the power consumption of a chip shares a linear relationship with its operating frequency.It forces the designers to search for other ways to increase performance without exponentially increasing power consumption

  • This led to the design of chip multi-processors (CMP) or multi-core architectures with high performance and low power consumption [2]

  • To facilitate fault tolerance at virtual channel allocation (VA), we propose to add two registers per 20:1 arbiter as shown in IDVC (Identification of the Virtual Channel), holds the identification of the default winner virtual channel (VC) and is

Read more

Summary

Introduction

A conventional way to increase chip performance is to improve its operational frequency. Aggressive technology scaling in a deep nanometer regime enables the fabrication of billions of transistors on a chip [1] This led to the design of chip multi-processors (CMP) or multi-core architectures with high performance and low power consumption [2]. NoC architecture is a packet-based inter-connected network that separates communication from the computation As it is different from the shared bus, it facilitates customization in terms of bandwidth, buffers size, and topology. We work on the permanent fault tolerance mechanism for each pipeline stage of the router It ensures connectivity of the healthy core associated with the faulty router. The rest of the paper is organized as follows; Section 2 presents the overview of existing reliable router architectures and fault detection mechanisms.

Related Work
Generic NoC Router Architecture
Overview of Router Input Port
RC Stage
VA Stage
SA Stage
XB Stage
RC Stage Fault Scenario
VA Stage Fault Scenario
SA Stage Fault Scenario
NoCGuard Router Micro-Architecture
RC Stage Fault-Tolerant Design
VA Stage Fault-Tolerant Design
SA Stage Fault-Tolerant Design
XB Stage Fault-Tolerant Design
Hardware Overhead Analysis
Reliability Analysis
FIT Rate of Generic 2-Stage NoC Router
FIT Rate of Correction Circuitry
MTTF Estimation of NoCGuard Router
Fault Estimation of Pipeline Stages
MDTF Estimation of NoCGuard Router
Latency Analysis
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call