Abstract

Adding redundant components is a well-known technique in the industry for replacing defective components, which results in yield improvement, and consequently, manufacturing cost reduction. Previously, most yield improvement strategies utilized redundant components only when another component had failed (i.e., cold spares). However, utilizing hot spares is becoming popular in commercial products (e.g., NVIDIA Ti GPU series). Hot spares address manufacturing cost when the components are defective; otherwise, they can be used to improve performance in the field. In this paper, we investigate the performance improvement of hot spares to see if it can be used to improve performance per watt (PPW) in multi-core single-instruction, multiple-thread (SIMT) processors over different applications. Also, we investigate the cost and PPW implications of employing different types of hot spares in SIMT processors. Then, we study optimal solutions in the cost-PPW design space to see what kind of redundancy improves cost and PPW the most. However, since evaluating individual design points (different SIMT processor configurations with redundancy) is time consuming, we adapt a design space exploration algorithm to find near-optimal solutions without evaluating the design space exhaustively, which finds approximated optimal solutions three times better than conventional methods. We observe that hot sparing is effective for specific types of SIMT processor configurations (small and medium sized). On these configurations, it can improve PPW more than 16%, on average, for applications that experience significant performance improvement by adding hot spares (e.g., FFT and FILTER). Furthermore, we show that hot sparing's PPW improvement on these applications is comparable with the results of conventional techniques (e.g., voltage scaling) and can be utilized together with them to more effectively improve PPW in the systems. Also, we observed that microarchitectural hot redundant resources (e.g., hot shared-spare lanes) achieve better PPW improvement than conventional architectural redundancies (e.g., hot spare cores).CCS Concepts: • Hardware → Yield and cost optimization; Application specific processors; Redundancy;

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call