Abstract

Concurrent autonomous self-test, or online self-test, allows a system to test itself, concurrently during normal operation, with no system downtime visible to the end-user. Online self-test is important for overcoming major reliability challenges such as early-life failures and circuit aging in future System-on-Chips (SoCs). To ensure required levels of overall reliability of SoCs, it is essential to apply online self-test to uncore components, e.g., cache controllers, DRAM controllers, and I/O controllers, in addition to processor cores. This is because uncore components can account for a significant portion of the overall logic area of a multi-core SoC. In this paper, we present an efficient online self-test technique for uncore components in SoCs. We achieve extremely high test coverage by storing high-quality test patterns in off-chip non-volatile storage. However, a simple technique that stalls the uncore-component-under-test can result in significant system performance degradation or even visible system unresponsiveness. Our new techniques overcome these challenges and enable cost-effective online self-test of uncore components through three special hardware features: 1. resource reallocation and sharing (RRS); 2. no-performance-impact testing; and, 3. smart backups. Implementation of online self-test for uncore components of the open-source OpenSPARC T2 multi-core SoC, using a combination of these three techniques, achieves high test coverage at < 1% area impact, < 1% power impact, and < 3% system-level performance impact. These results demonstrate the effectiveness and practicality of our techniques.

Highlights

  • In many System-on-Chips (SoCs), uncore components account for a significant portion of the total chip area

  • Since there has been a large amount of research on resilience techniques for on-chip memories (e.g., error correcting codes (ECC), transparent memory Built-In Self-Test (BIST), scrubbing, and sparing), this paper focuses on uncore logic blocks such as cache controllers, DRAM controllers, and I/O controllers

  • In OpenSPARC T2, we demonstrate the smart backup technique for the I/O subsystem, which consists of four modules: non-cachable unit (NCU), system interface unit (SIU), PCI Express interface unit (PIU), and network interface unit (NIU)

Read more

Summary

Introduction

In many System-on-Chips (SoCs), uncore components account for a significant portion of the total chip area. For the open-source OpenSPARC T2 SoC [SUN 09] with 8 processor cores supporting 64 hardware threads, uncore components include on-chip memories (occupying 76% of the total chip area), cache controllers, DRAM controllers, I/O controllers, and crossbar. In OpenSPARC T2, the nonmemory uncore components account for 51% of the total non-memory chip area. This paper presents efficient online self-test techniques for uncore components of SoCs. Since there has been a large amount of research on resilience techniques for on-chip memories (e.g., error correcting codes (ECC), transparent memory BIST, scrubbing, and sparing), this paper focuses on uncore logic blocks such as cache controllers, DRAM controllers, and I/O controllers. Traditional pseudo-random Logic BIST may be used but it may be expensive

Orchestration of online self-test
CASP Overview CASP stands for Concurrent Autonomous chip self-test using
Select outputs from backup
Performance Impact of RRS and Smart Backup Techniques RRS for L2 cache banks
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.