Level Data Cache Research Articles

Early reliability assessment of hardware structures using microarchitecture level simulators can effectively guide major error protection decisions in microprocessor design. Statistical fault injection on microarchitectural structures modeled in performance simulators is an accurate method to measure their Architectural Vulnerability Factor (AVF) but requires excessively long campaigns to obtain high statistical significance. We propose MeRLiN1, a methodology to boost microarchitecture level injection-based reliability assessment by several orders of magnitude and keep the accuracy of the assessment unaffected even for large injection campaigns with very high statistical significance. The core of MeRLiN is the grouping of faults of an initial list in equivalent classes. All faults in the same group target equivalent vulnerable intervals of program execution ending up to the same static instruction that reads the faulty entries. Faults in the same group occur in different times and entries of a structure and it is extremely likely that they all have the same effect in program execution; thus, fault injection is performed only on a few representatives from each group. We evaluate MeRLiN for different sizes of the physical register file, the store queue and the first level data cache of a contemporary microarchitecture running MiBench and SPEC CPU2006 benchmarks. For all our experiments, MeRLiN is from 2 to 3 orders of magnitude faster than an extremely high statistical significant injection campaign, reporting the same reliability measurements with negligible loss of accuracy. Finally, we theoretically analyze MeRLiN's statistical behavior to further justify its accuracy.

Read full abstract

ABSTRACTStacking multiple device strata can improve system performance of a microprocessor (μP) by reducing interconnect length. This enables latency improvement, power reduction, and improved memory bandwidth. In this paper we review some of our recent design analysis and process results which quantitatively show the benefits of stacking applied to μPs.We report on two applications for stacking which take advantage of reduced wire length- “logic+logic” stacking and “logic+memory” stacking. In addition to optimizing minimum wire length, we considered carefully the thermal ramifications of the new designs. For the logic+memory application, we considered the case of reducing off-die wiring by stacking a DRAM cache (32 to 64MB) onto a high performance μP. Simulations showed 3x reduced off-die bandwidth, Cycles Per Memory Access (CPMA) reduction of 13%, and a 66% average bus power reduction. For logic+logic applications, we considered a high performance μP where the unit blocks were repartitioned into two strata. For this case, simulations showed that stacking can simultaneously reduce power by 15% while increasing performance by 15% with a minor 14° C increase in peak temperature compared to the planar design. Using voltage scaling, this translates to 34% power reduction and 8% performance improvement with no temperature increase. We found that these results can be further improved by a secondary splitting of the individual blocks. As an example, we split a 32KB first level data cache resulting in 25% power reduction, 10% latency reduction, and 20% area reduction.We also discuss the fabrication of stacked structures with two complimentary process flows. In one case, we developed a 300mm wafer stacking process using Cu-Cu bonding, wafer thinning, and through-silicon vias (TSVs). This technology provides reliable bonding with non-detectable bonding-interface resistance and inter-strata via pitch below 8μm. We investigated the impact of this wafer stacking process to the transistor and interconnect layers built using a 65nm strained-Si/Cu-Low-K process technology and found no impact to either discrete N- and P-MOS devices or to thin 4Mb SRAMs. We verified fully functional SRAMs on thinned wafers with thicknesses down to 5μm. Although wafer stacking leads itself well to tight-pitch same-die-size stacking, die stacking enables integration of different size dies and includes opportunity to improve yield by stacking known good dies. We demonstrated a die stack process flow with 75μm thinned die, TSV, and inter-strata via pitch below 100μm. We also found negligible impact to transistors using this process flow. Multiple stacks of up to seven 75μm thin dies with TSVs were fabricated and tested. Prospects for high volume integration of 3D into μPs are discussed.

Read full abstract

Level Data Cache Research Articles

Related Topics

Articles published on Level Data Cache

Early miss prediction based periodic cache bypassing for high performance GPUs

MeRLiN

Designing a practical data filter cache to improve both energy efficiency and performance

A SystemC Cache Simulator for a Multiprocessor Shared Memory System

SS-SERA: An improved framework for architectural level soft error reliability analysis

REDUCING CACHE HIERARCHY ENERGY CONSUMPTION BY PREDICTING FORWARDING AND DISABLING ASSOCIATIVE SETS

Applying architectural vulnerability Analysis to hard faults in the microprocessor

Exploiting Narrow Values for Soft Error Tolerance

Design and Fabrication of 3D Microprocessors

Static energy reduction techniques for microprocessor caches

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Level Data Cache Research Articles

Related Topics

Articles published on Level Data Cache

Early miss prediction based periodic cache bypassing for high performance GPUs

MeRLiN

Designing a practical data filter cache to improve both energy efficiency and performance

A SystemC Cache Simulator for a Multiprocessor Shared Memory System

SS-SERA: An improved framework for architectural level soft error reliability analysis

REDUCING CACHE HIERARCHY ENERGY CONSUMPTION BY PREDICTING FORWARDING AND DISABLING ASSOCIATIVE SETS

Applying architectural vulnerability Analysis to hard faults in the microprocessor

Exploiting Narrow Values for Soft Error Tolerance

Design and Fabrication of 3D Microprocessors

Static energy reduction techniques for microprocessor caches