Benchmarking Radiation Transport Monte Carlo Simulations with MCNP and Geant4 Using High Performance Computing.
The objective of this paper is to compare the performance of several high-performance computing systems in order to inform decisions regarding their use for Monte Carlo simulations of radiation transport. Gamma ray emission from 131 I in the human thyroid and detection using a personal radiation detector were modeled using the MCNP and Geant4 Monte Carlo software. These simulations were benchmarked by recording the computing time needed to run the simulation as a function of the number of parallel computing threads used. Simulations were run using a virtual machine, two desktop PCs, a CX-1 supercomputer, the Government of Canada General Purpose Science Cluster, and cloud computing. Using a higher number of parallel threads on these high-performance computing systems was found to reduce the computing time needed to run the MCNP and Geant4 simulations. The optimal configuration for running the simulations on cloud computing was evaluated, considering the number of available processors, the computing time, and the cost. Cloud computing was found to be a cost-effective, on-demand, high performance computing option for Monte Carlo simulations.
- Supplementary Content
- 10.5451/unibas-007207821
- Jan 1, 2019
- edoc (University of Basel)
Scientific applications are often large, complex, computationally-intensive, and irregular. Loops are often an abundant source of parallelism in scientific applications. Due to the ever-increasing computational needs of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering increased parallelism at multiple hardware levels. Load imbalance, caused by irregular computational load per task and unpredictable computing system characteristics (system variability), often degrades the performance of applications. Besides, perturbations, such as reduced computing power, network latency availability, or failures, can severely impact the performance of the applications. System variability and perturbations are only expected to increase in future extreme-scale computing systems. Extrapolating the current failure rate to Exascale would result in a failure every 20 minutes. Such failure rate and perturbations would render the computing systems unusable. This doctoral thesis improves the performance of computationally-intensive scientific applications on HPC systems via robust load balancing. Robust scheduling ensures and maintains improved load balanced execution under unpredictable application and system characteristics. A number of dynamic loop self-scheduling (DLS) techniques have been introduced and successfully used in scientific applications between the 1980s and 2000s. These DLS techniques are not fault-tolerant as they were originally introduced. In this thesis, we identify three major research questions to achieve robust scheduling (1) How to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications? (2) How to select a DLS technique that will achieve improved performance under perturbations? (3) How to tolerate perturbations during execution and maintain a load balanced execution on HPC systems? To answer the first question, we reproduced the original experiments that introduced the DLS techniques to verify their present implementation. Simulation is used to reproduce experiments on systems from the past. Realistic simulation induces a similar analysis and conclusions to the analysis of the native results. To this end, we devised an approach for bridging the native and simulative executions of parallel applications on HPC systems. This simulation approach is used to reproduce scheduling experiments on past and present systems to verify the implementation of DLS techniques. Given the multiple levels of parallelism offered by the present HPC systems, we analyzed the load imbalance in scientific applications, from computer vision, astrophysics, and mathematical kernels, at both thread and process levels. This analysis revealed a significant interplay between thread level and process level load balancing. We found that dynamic load balancing at the thread level propagates to the process level and vice versa. However, the best application performance is only achieved by two-level dynamic load balancing. Next, we examined the performance of applications under perturbations. We found that the most robust DLS technique does not deliver the best performance under various perturbations. The most efficient DLS technique changes by changing the application, the system, or perturbations during execution. This signifies the algorithm selection problem in the DLS. We leveraged realistic simulations to address the algorithm selection problem of scheduling under perturbations via a simulation assisted approach (SimAS), which answers the second question. SimAS dynamically selects DLS techniques that improve the performance depending on the application, system, and perturbations during the execution. To answer the third question, we introduced a robust dynamic load balancing (rDLB) approach for the robust self-scheduling of scientific applications under failures (question 3). rDLB proactively reschedules already allocated tasks and requires no detection of perturbations. rDLB tolerates up to P −1 processor failures (P is the number of processors allocated to the application) and boosts the flexibility of applications against nonfatal perturbations, such as reduced availability of resources. This thesis is the first to provide insights into the interplay between thread and process level dynamic load balancing in scientific applications. Verified DLS techniques, SimAS, and rDLB are integrated into an MPI-based dynamic load balancing library (DLS4LB), which supports thirteen DLS techniques, for robust dynamic load balancing of scientific applications on HPC systems. Using the methods devised in this thesis, we improved the performance of scientific applications by up to 21% via two-level dynamic load balancing. Under perturbations, we enhanced their performance by a factor of 7 and their flexibility by a factor of 30. This thesis opens up the horizons into understanding the interplay of load balancing between various levels of software parallelism and lays the ground for robust multilevel scheduling for the upcoming Exascale HPC systems and beyond.
- Book Chapter
- 10.1007/978-3-319-66896-3_4
- Jan 1, 2017
Usually, scientific applications outlive the lifespan of the High Performance Computing (HPC) systems for which they are initially developed. The innovations in the HPC systems’ hardware and parallel programming standards drive the modernization of HPC applications so that they continue being performant. While such code modernization efforts may not be challenging for HPC experts and well-funded research groups, many domain-experts and students may find it challenging to adapt their applications for the latest HPC systems due to lack of expertise, time, and funds. The challenges of such domain-experts and students can be mitigated by providing them high-level tools for code modernization and migration. A brief overview of two such high-level tools is presented in this chapter. These tools support the code modernization and migration efforts by assisting users in parallelizing their applications and porting them to HPC systems with high-bandwidth memory. The tools are named as: Interactive Parallelization Tool (IPT) and Interactive Code Adaptation Tool (ICAT). Such high-level tools not only improve the productivity of their users and the performance of the applications but they also improve the utilization of HPC resources.
- Conference Article
- 10.1109/itc-cscc.2019.8793388
- Jun 1, 2019
High Performance Computing (HPC) System refers to using a cluster of hundreds or more processing nodes for application which requires a lot of computation. The need for HPC systems has increased in recent years as more applications require a lot of computations such as Deep learning and AI. Various tools and techniques are being developed and researched to efficiently operate HPC systems in line with this need. In this paper, we develop multi-node power/performance modeling. This will help HPC system users to predict power and performance before they configure the system so that they can implement the optimal HPC system for programmers. Power and performance prediction modeling achieves 90% accuracy and takes less than an hour to predict.
- Research Article
3
- 10.1007/s11704-010-0372-0
- Nov 4, 2010
- Frontiers of Computer Science in China
The China HPC TOP100 list, an annual report of the 100 most powerful high performance computing (HPC) systems installed in mainland China, has traced the rapid growth of HPC technology in China since its first publication in 2002. This paper introduces the China HPC TOP100 list and reviews the current status of HPC systems in China in terms of system features, manufactures, and areas of application using the data reported in the most recent list, published on November 1st, 2009. We provide further analysis, prediction of future trends, and directions of the development of HPC systems in China referencing historical data accumulated through archived TOP100 lists and other publically available information. We predict that the aggregated Linpack performance of the top 100 HPC systems will reach 10 PFlops in 2011, a single system with 10 PFlops peak performance will appear between 2012 and 2013, the aggregated performance of the top 100 systems will reach 100 PFlops in 2014, and a single system with 100 PFlops peak performance will appear around 2015.
- Conference Article
2
- 10.1109/icbda.2017.8078793
- Mar 1, 2017
The growing demands in IT services for improving efficiency and quality at low cost to handle complex compute requirements has led to the integration of High performance computing (HPC) systems and cloud infrastructure in data centers. Earlier, HPC systems were limited to academic and research institutions and engineering laboratories. However, the emergence of cloud infrastructures and their successful implementation in different areas of application usage including manufacturing, industries, etc. has motivated the integration of HPC systems. Thus, providing multiple benefits for scientific, industrial and enterprise organizations. This paper explores architectures of HPC and cloud services in data centers and highlights the benefits of integrating these architectures.
- Book Chapter
- 10.1007/978-3-319-33742-5_16
- Jan 1, 2016
Data-intensive computing brings a new set of challenges that do not completely overlap with those met by the more typical and even state-of-the-art High Performance Computing (HPC) systems. Working with ‘big data’ can involve analyzing thousands of files that need to be rapidly opened, examined and cross-correlated—tasks that classic HPC systems might not be designed to do. Such tasks can be efficiently conducted on a data-intensive supercomputer like the Wrangler supercomputer at the Texas Advanced Computing Center (TACC). Wrangler allows scientists to share and analyze the massive collections of data being produced in nearly every field of research today in a user-friendly manner. It was designed to work closely with the Stampede supercomputer, which is ranked as the number ten most powerful in the world by TOP500, and is the HPC flagship of TACC. Wrangler was designed to keep much of what was successful with systems like Stampede, but also to introduce new features such as a very large flash storage system, a very large distributed spinning disk storage system, and high speed network access. This allows a new way for users to access HPC resources with data analysis needs that weren’t being fulfilled by traditional HPC systems like Stampede. In this chapter, we provide an overview of the Wrangler data-intensive HPC system along with some of the big data use-cases that it enables.
- Conference Article
3
- 10.1145/3626203.3670537
- Jul 17, 2024
The Council for Scientific and Industrial Research (CSIR)'s National Integrated Cyberinfrastructure System (NICIS) plays a pivotal role in advancing two key initiatives that focus on developing cyberinfrastructure across the African continent: the Southern African Development Community (SADC) Cyberinfrastructure Framework, and the Square Kilometre Array (SKA) Partner Countries Big Data initiative. Within NICIS these initiatives are managed through the HPC Ecosystems Project, which has two primary objectives: distributing entry level High Performance Computing (HPC) systems by repurposing decommissioned tier-1 HPC systems, and cultivating a skilled HPC workforce across Africa. The first deployment of HPC systems under the project occurred in 2013, using repurposed hardware from the Texas Advanced Computing Center's decommissioned Ranger HPC system. These systems were allocated to bolster research capabilities at local research institutes in South Africa and within partner countries of the SKA project across Africa. A decade later, at the close of 2023, the HPC Ecosystems Project has deployed 35 HPC systems in 11 countries and delivered more than 30 formal HPC training workshops to over 700 participants, surpassing 21000 total participation hours. There is an active and growing virtual community exceeding 230 HPC practitioners globally. This paper provides a high-level overview of the first ten years of the project's lifespan; outlining the various approaches towards establishing sustainable cyberinfrastructure and HPC workforces in Africa. Included is a reflection on the challenges experienced, lessons learned, and progress made towards delivering cyberinfrastructure resources and HPC training to resource-constrained environments.
- Conference Article
6
- 10.1109/adcom.2012.6563585
- Dec 1, 2012
Power measurement and analysis are important aspects for optimizing the power consumption in High Performance Computing (HPC) systems. With the huge increase in the power consumption of HPC systems, it is important to compare systems with metrics based on performance per watt. There are various hardware and software based power measurement techniques available for HPC systems. But, it's a complex task to accurately measure and analyze the power consumption of entire HPC nodes in real time. Hence, we have used hardware based power measurement technique with Multi-Agent based framework for analyzing power in HPC systems at real time. We clearly demonstrated the power consumed while running the various workloads such as High Performance Linpack (HPL) and NAS Parallel Benchmarks (NPB).
- Conference Article
1
- 10.1109/hpcc/smartcity/dss.2019.00315
- Aug 1, 2019
System logs provide invaluable resources for understanding system behavior and detecting anomalies on high performance computing (HPC) systems. As HPC systems continue to grow in both scale and complexity, the sheer volume of system logs and the complex interaction among system components make the traditional manual problem diagnosis and even automated line-by-line log analysis infeasible or ineffective. Sequence mining technologies aim to identify important patterns among a set of objects, which can help us discover regularity among events, detect anomalies, and predict events in HPC environments. The existing sequence mining algorithms are compute-intensive and inefficient to process the overwhelming number of system events which have complex interaction and dependency. In this paper, we present a novel, topology-aware sequence mining method (named TSM) and explore it for event analysis and anomaly detection on production HPC systems. TSM is resource-efficient and capable of producing long and complex event patterns from log messages, which makes TSM suitable for online monitoring and diagnosing of large-scale systems. We evaluate the performance of TSM using system logs collected from a production supercomputer. Experimental results show that TSM is highly efficient in identifying event sequences on single and multiple nodes without any prior knowledge. We apply verification functions and requirements and prove the correctness of the event patterns produced by TSM.
- Research Article
13
- 10.1080/17445760.2013.803686
- Jan 22, 2014
- International Journal of Parallel, Emergent and Distributed Systems
Cloud computing offers new computing paradigms, capacity and flexible solutions to high performance computing (HPC) applications. For example, Hardware as a Service (HaaS) allows users to provide a large number of virtual machines (VMs) for computation-intensive applications using the HaaS model. Due to the large number of VMs and electronic components in HPC system in the cloud, any fault during the execution would result in re-running the applications, which will cost time, money and energy. In this paper we presented a proactive fault tolerance (FT) approach to HPC systems in the cloud to reduce the wall-clock execution time and dollar cost in the presence of faults. We also developed a generic FT algorithm for HPC systems in the cloud. Our algorithm does not rely on a spare node prior to prediction of a failure. We also developed a cost model for executing computation-intensive applications on HPC systems in the cloud. We analysed the dollar cost of provisioning spare nodes and checkpointing FT to assess the value of our approach. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in cloud can be reduced by as much as 30%. The frequency of checkpointing of computation-intensive applications can be reduced up to 50% with our FT approach for HPC in the cloud compared with current FT approaches.
- Conference Article
- 10.1109/bdcloud.2018.00079
- Dec 1, 2018
With the increasing of scale and complexity of high performance computing (HPC) systems, the programming, debugging, and tuning of large-scale parallel programs face a series of challenges, one of which is that programmers often need to repeatedly run their programs with large number of processes on HPC systems to identify sources of errors and performance bottlenecks in their programs, which means large amounts of resource consumptions. Furthermore, since most HPC systems use job scheduling system to manage their resources and schedule multiple jobs from different users, programmers cannot interact with their programs during the execution of programs, which further increases complexities of debugging and tuning. To address this challenge, this paper proposes a system that re-runs large-scale MPI parallel programs using two nodes. According to an approach of one real-execution + multiple emulation-executions, the parallel program is firstly executed with desired number of processes on an HPC system, which is referred as real-execution, and during the execution, our system records MPI messages transmitted among processes as well as control information of processes; after that, one or more processes can be re-run on a two-node local system under the scale the same with the real-execution. In the meantime, programmers can interact with their programs by attaching the GDB, a commonly used debugger, to the re-running process. Therefore, not only can our system reduce resource-consumptions in debugging and tuning of large-scale parallel programs significantly, but also support interactions between developers and their programs during the execution of the programs, which makes programmers easier to identify sources of the errors and performance bottlenecks in their parallel programs.
- Book Chapter
- 10.4018/978-1-5225-0287-6.ch010
- Jan 1, 2016
High performance computing (HPC) systems are becoming the norm for daily use and care must be made to ensure that these systems are resilient. Recent contributions on resiliency have been from quantitative and qualitative perspectives where general system failures are considered. However, there are limited contributions dealing with the specific classes of failures that are directly related to cyber-attacks. In this chapter, the author uses the concepts of transition processes and limiting distributions to perform a generic theoretical investigation of the effects of targeted failures by relating the actions of the cyber-enemy (CE) to different risk levels in an HPC system. Special cases of constant attack strategies are considered where exact solutions are obtained. Additionally, a stopped process is introduced to model the effects of system termination. The results of this representation can be directly applied throughout the HPC community for monitoring and mitigating cyber-attacks.
- Research Article
3
- 10.14529/jsfi190105
- Mar 1, 2019
- Supercomputing Frontiers and Innovations
Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.
- Dissertation
3
- 10.25148/etd.fidc006527
- Jun 8, 2018
The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system's demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems' energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system's emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system's energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response.
- Conference Article
- 10.1109/parcomptech.2013.6621402
- Feb 1, 2013
High Performance Computing (HPC) Systems provide access to high end resources for parallel jobs execution. Resource monitoring and management are the most important aspects of providing a successful HPC environment. Improving performance, reducing energy consumption and operating costs for HPC environment is crucial. There can be different management strategies to manage HPC resources like energy, performance and operating cost based on the overall system's state, the nature of the workload queued and the administrator's choice. As per the current research trends, there is a need to put all these strategies under one umbrella. This paper presents a design of an energy aware framework which bundles all these strategies to autonomically identifying the best suitable resource management strategy. This framework works with the help of multiple intelligent agents and also uses the past knowledge of the application behavior to decide the strategy. We have explained how this framework intends to reduce the energy consumption and operating cost of HPC Systems by selecting the proposed energy management strategy.