Abstract

Networks-on-Chip (NoCs) are promising fabrics to provide scalable and efficient on-chip communication for large-scale many-core systems. In place of the well-studied synchronous NoCs, the event-driven asynchronous ones have emerged as promising replacement thanks to their strong timing robustness especially when implemented in quasi-delay-insensitive (QDI) circuits. However, their fault tolerance has rarely been studied. The QDI NoCs show complicated failure scenarios and behave differently from synchronous ones. As the scaling semiconductor technology is expected with the accelerated aging process, permanent faults become more likely to happen at runtime. These faults can break the handshake, leading to physical-layer deadlocks which can spread and paralyze the whole QDI NoC. This physical-layer deadlock cannot be resolved using conventional fault-tolerant or deadlock management techniques. This paper systematically studies the impact of permanent faults on QDI NoCs, and presents novel deadlock detection and recovery techniques to handle the fault-caused physical-layer deadlock. The proposed detection technique has been implemented to protect the NoC data paths that occupy ~90% of the logic. Employing the detection and recovery techniques to protect interrouter links (~60% of the logic), a permanently faulty link is precisely located and the network function can be recovered with graceful performance degradation.

Highlights

  • N ETWORKS-ON-CHIP (NoCs) are a promising infrastructure to support on-chip communication of large-scale multicore systems due to their efficiency and Manuscript received November 13, 2016; revised March 4, 2017 and May 31, 2017; accepted July 5, 2017

  • Synchronous NoCs need to distribute the global clock with little skew over long distances, which may cross multiple timing domains belonging to different intellectual property (IP) cores

  • In a pure QDI NoC studied in this paper, this partial data will propagate to all downstream stages as long as they are ready (Section III), which will cause multiple deadlocks reported along the deadlocked packet path if their technique is used, failing to locate the fault position

Read more

Summary

INTRODUCTION

N ETWORKS-ON-CHIP (NoCs) are a promising infrastructure to support on-chip communication of large-scale multicore systems due to their efficiency and Manuscript received November 13, 2016; revised March 4, 2017 and May 31, 2017; accepted July 5, 2017. They can halt the handshaking process, resulting in physical-layer deadlocks These deadlocks are different from network-layer ones caused by the cyclic dependence of packets [13]. ZHANG et al.: HANDLING PHYSICAL-LAYER DEADLOCK CAUSED BY PERMANENT FAULTS IN QDI NoCs work without locating and isolating the faulty component. Handling runtime permanent faults on QDI NoCs in such a deadlock state is more difficult than on synchronous NoCs. In the era of deep submicrometer when reliability becomes one of the critical challenges for digital systems [14], it is important to keep specific, critical or ultraexpensive systems working even with some performance loss, proposing a demand for permanent-fault-tolerant QDI NoCs. This paper handles physical-layer deadlocks caused by permanent faults on QDI NoCs. Its contribution includes the following. 4) For intermittent faults (early symptom of permanent faults) that are long enough to cause a deadlock, the recovery mechanism automatically resumes the isolated pipeline stages once the fault disappears

Asynchronous Protocols
Permanent Faults on Synchronous Versus QDI Circuits
MODELING FAULT-CAUSED DEADLOCKS
Baseline QDI NoC Router
Deadlock Management Strategies for QDI NoCs
General Detection Strategy for Deadlocked QDI NoCs
DEADLOCK DETECTION ON QDI NoCs
Impact of Permanently Faulty Links
Detect Deadlock Caused by a Permanently Faulty Link
RECOVERY FROM PERMANENTLY FAULTY LINKS
Deadlock Removal
Faulty Link Isolation
TECHNICAL ISSUES
FAULT TOLERANCE EVALUATION AND COMPARISON
COMPARISON WITH RELATED WORK
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.