Thoughts on high-performance computing

Xuejun Yang

doi:10.1093/nsr/nwu002

Abstract

Parallel computing is the main technical approach for achieving very high performance computing. In the history of parallel computing, there have been threephases, i.e.moderateparallelismdescribed by Amdahl’s law [1], large-scale parallelism described by Gustafson’s law [2], andhigh-productivity parallelismdescribed by the productivity evaluation model [3]. In April 2010, IBM Inc. in their report ‘Some Challenges on Road fromPetascale to Exascale’ presented five challenges in an exascale system; these stem from power consumption, memory access, communication, reliability, and programming [4], respectively referred to as the energy wall, memory wall, communication wall, reliability wall, and programming wall. Faced with the challenges of ‘walls’, we investigate wall measurement models at the scientific level. For example, existing reliability theories, such as probability theory, do not consider the effect of reliability on performance, while the classic speedup model does not reflect the relation between performance and reliability. To incorporate reliability and performance into a unified measurement model, we measure reliability based on fault-tolerant overhead. As current faulttolerant techniques include a certain time overhead,we created a reliability speedup model with fault tolerance to measure the effect of fault-tolerant overhead on speedup:

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thoughts on high-performance computing

Abstract

Talk to us

Similar Papers

More From: National Science Review

Lead the way for us

Journal: National Science Review	Publication Date: Sep 1, 2014
License type: CC BY 4.0

Similar Papers

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism
Erlin Yao ... Guangming Tan
-
Erlin Yao, et. al.Erlin Yao ... Guangming Tan
01 May 2012
01 May 2012

Recent developments in high‐performance computing and simulation: distributed systems, architectures, algorithms, and applications
Waleed W Smari ... Sandro Fiore
Concurrency and Computation: Practice and Experience | VOL. 27
Waleed W Smari, et. al.Waleed W Smari ... Sandro Fiore
25 May 2015
Concurrency and Computation: Practice and Experience | VOL. 27

Convergence-aware optimal checkpointing for exploratory deep learning training jobs
Hongliang Li ... Haixiao Xu
Future Generation Computer Systems | VOL. 164
Hongliang Li, et. al.Hongliang Li ... Haixiao Xu
01 Mar 2025
Future Generation Computer Systems | VOL. 164

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud
Ifeanyi P Egwutuoha ... Rafael Calvo
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29
Ifeanyi P Egwutuoha, et. al.Ifeanyi P Egwutuoha ... Rafael Calvo
22 Jan 2014
International Journal of Parallel, Emergent and Distributed Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thoughts on high-performance computing

Abstract

Talk to us

Similar Papers

More From: National Science Review