Abstract

Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the major challenge of silent data corruptions (SDCs) and aim on solutions to minimize its impact by avoiding, detecting, and mitigating SDCs. Recent studies on large scale datacenters conducted by Meta and Google report an unexpected rate of silent data corruption incidents that are attributed to modern microprocessor generations. Despite the acknowledged severity of the phenomenon, particularly at the datacenter scale, there is no in-depth analysis of the microarchitectural locations in a complex microprocessor that are more likely to generate an SDC at the program outputs. In this paper, we present a detailed analysis of the faulty behavior of many critical microarchitectural structures of a modern out-of-order microprocessor generating silent data corruptions. Our analysis unveils several observations, including: (i) the magnitude of silent data corruptions attributed to different hardware structures, (ii) the instruction-related parameters that are more likely to result in a silent data corruption, (iii) the extent to which the operating system affects the silent data corruption occurrences, and (iv) the byte positions of a word which are more likely to result in silent data corruptions. Collectively, such findings can assist decisions for hardware and software schemes for the reduction of the likelihood of silent data corruptions generation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call