Abstract
Concurrent autonomous self-test, or online self-test, allows a system to test itself, concurrently during normal operation, with no system downtime visible to the end-user. Online self-test is important for overcoming major reliability challenges such as early-life failures and circuit aging in future System-on-Chips (SoCs). To ensure required levels of overall reliability of SoCs, it is essential to apply online self-test to uncore components, e.g., cache controllers, DRAM controllers, and I/O controllers, in addition to processor cores. This is because uncore components can account for a significant portion of the overall logic area of a multi-core SoC. In this paper, we present an efficient online self-test technique for uncore components in SoCs. We achieve extremely high test coverage by storing high-quality test patterns in off-chip non-volatile storage. However, a simple technique that stalls the uncore-component-under-test can result in significant system performance degradation or even visible system unresponsiveness. Our new techniques overcome these challenges and enable cost-effective online self-test of uncore components through three special hardware features: 1. resource reallocation and sharing (RRS); 2. no-performance-impact testing; and, 3. smart backups. Implementation of online self-test for uncore components of the open-source OpenSPARC T2 multi-core SoC, using a combination of these three techniques, achieves high test coverage at < 1% area impact, < 1% power impact, and < 3% system-level performance impact. These results demonstrate the effectiveness and practicality of our techniques.
Highlights
In many System-on-Chips (SoCs), uncore components account for a significant portion of the total chip area
Since there has been a large amount of research on resilience techniques for on-chip memories (e.g., error correcting codes (ECC), transparent memory Built-In Self-Test (BIST), scrubbing, and sparing), this paper focuses on uncore logic blocks such as cache controllers, DRAM controllers, and I/O controllers
In OpenSPARC T2, we demonstrate the smart backup technique for the I/O subsystem, which consists of four modules: non-cachable unit (NCU), system interface unit (SIU), PCI Express interface unit (PIU), and network interface unit (NIU)
Summary
In many System-on-Chips (SoCs), uncore components account for a significant portion of the total chip area. For the open-source OpenSPARC T2 SoC [SUN 09] with 8 processor cores supporting 64 hardware threads, uncore components include on-chip memories (occupying 76% of the total chip area), cache controllers, DRAM controllers, I/O controllers, and crossbar. In OpenSPARC T2, the nonmemory uncore components account for 51% of the total non-memory chip area. This paper presents efficient online self-test techniques for uncore components of SoCs. Since there has been a large amount of research on resilience techniques for on-chip memories (e.g., error correcting codes (ECC), transparent memory BIST, scrubbing, and sparing), this paper focuses on uncore logic blocks such as cache controllers, DRAM controllers, and I/O controllers. Traditional pseudo-random Logic BIST may be used but it may be expensive
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.