Published in last 50 years
Articles published on Supercomputer
- New
- Research Article
- 10.51137/wrp.ijarbm.408
- Nov 6, 2025
- International Journal of Applied Research in Business and Management
- Reginald Mulalo Ndwamai + 1 more
Understanding how organisational support, perceived ease of use, and perceived usefulness influence the actual use of a High-Performance Computing (HPC) system sheds light on its adoption within a higher education institution. A quantitative approach with a descriptive design was employed, surveying 218 respondents, including Master's and PhD students and academic staff, using stratified and simple random sampling techniques. Data were collected through online self-administered questionnaires and analysed using SPSS version 29.0. The findings revealed strong positive correlations between organisational support, perceived ease of use, perceived usefulness, and actual use of the HPC system, supporting prior research on technology adoption (Davis, 1989; Venkatesh et al., 2003). A moderate positive correlation was found between organisational support and perceived ease of use and perceived usefulness. This suggests that users who feel supported by their institution are more likely to find the system easy to use and valuable. Furthermore, a moderate positive correlation was found between perceived ease of use and perceived usefulness, as well as between perceived ease of use and actual use, indicating that users who find the system simple to use are more likely to adopt it. Additionally, there was a moderate positive correlation between perceived usefulness and actual use, highlighting that users engage more with the system when they recognise its benefits. In conclusion, organisational support and user perceptions are key to successfully implementing and using new technologies.
- New
- Research Article
- 10.1007/s11227-025-08013-z
- Nov 6, 2025
- The Journal of Supercomputing
- Takieddine Meriouma + 3 more
Abstract Getting a precise estimate of electric fields around extra-high-voltage (EHV) transmission lines is essential for keeping the public safe, ensuring environmental compliance, and planning infrastructure effectively. Unfortunately, traditional numerical methods often struggle with accuracy and can be slow to converge, which makes them less suitable for large-scale projects. This study introduces a hybrid computational framework that combines the Charge Simulation Method (CSM) with the Firefly Algorithm (FA). This combination helps optimize the number, position, and strength of simulation charges, leading to better modeling accuracy and efficiency. Additionally, we have trained three artificial intelligence (AI) models: Multilayer Perceptron Neural Network (MLPNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Least Squares Support Vector Machine (LS-SVM) on real-world field data to reliably predict electric field values. Notably, LS-SVM is being used in this context for the first time and has shown to outperform the other models in accuracy, generalization, and speed. We evaluated the proposed CSM-FA hybrid model alongside AI predictions using metrics like Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and the coefficient of determination ( R 2 ), revealing significant improvements over traditional methods. Given the heavy computational demands of the optimization and learning phases, we utilized high-performance computing (HPC) resources for implementation. This work not only advances algorithmic innovation and AI-assisted simulation but also enhances HPC applications, providing a scalable and precise solution for real-time field monitoring and regulatory assessments. The methodology aligns well with the scientific goals of The Journal of Supercomputing and fosters advanced research in intelligent power system modeling.
- New
- Research Article
- 10.1007/s11227-025-07986-1
- Nov 5, 2025
- The Journal of Supercomputing
- Edixon Parraga + 4 more
Abstract Distributed deep learning (DDL) applications generate heavy input/output (I/O) workloads that can create bottlenecks in high-performance computing (HPC) systems. Their optimal I/O configuration depends on factors such as access patterns, storage hardware, dataset size, and execution scale. This study proposes a systematic methodology for characterizing and optimizing I/O behavior in DDL applications, represented through the deep learning I/O benchmark (DLIO), and validated with the real DeepGalaxy application. We evaluate access modes, file formats, and Lustre file system configurations, demonstrating that stripe counts optimized for the access pattern and application scale can reduce I/O and execution times, achieving up to 18 GiB/s of bandwidth and a 5X increase in IOPS. HDF5 provides balanced performance, while TFRecord stands out in bandwidth-intensive scenarios. Shared access minimizes contention and improves scalability in multi-node executions. The results are consolidated into configuration guidelines that offer practical recommendations for practitioners to tune DDL applications for efficient execution in HPC environments.
- New
- Research Article
- 10.1021/acs.jcim.5c02081
- Nov 3, 2025
- Journal of chemical information and modeling
- Fengbo Yuan + 20 more
Artificial intelligence (AI) is reshaping computational science, but AI-driven workflows routinely span heterogeneous tasks executed across diverse high-performance computing (HPC) systems. We introduce DPDispatcher, an open-source Python framework for scalable, fault-tolerant task scheduling in such environments with an emphasis on lightweight submission, automatic retries, and robust resumption. DPDispatcher separates connection and file-staging concerns from scheduler control, supports multiple HPC job managers, and provides both local and secure shell (SSH) backends. DPDispatcher has been adopted by more than ten scientific packages. Representative use cases include active learning for machine-learning potentials, free-energy and thermodynamic integration workflows, large-scale materials screening, and large language model (LLM)-driven agents that launch HPC computations. Across these settings, DPDispatcher reduces operational overhead and error rates while improving portability and automation for reliable, high-throughput scientific computing.
- New
- Research Article
- 10.1177/10943420251393314
- Nov 3, 2025
- The International Journal of High Performance Computing Applications
- Muhammad Rizwan + 2 more
This survey paper focuses on examining the optimization techniques and trends for the High-Performance Conjugate Gradient (HPCG) benchmark employed in the last 10 years. The HPCG benchmark was introduced to eliminate the limitations of the High-Performance Linpack (HPL) benchmark and reflect the realistic performance measure of modern supercomputer architectures. Our study evaluates HPCG optimizations performed by High-Performance Computing (HPC) researchers on diverse hardware architectures such as CPU, GPU, MIC, and FPGA, etc., with a focus on optimizing the reference HPCG benchmark code for data formats, parallelization strategies, and architecture-specific tuning. We reviewed the optimizations performed by the researchers and presented a comprehensive analysis of these optimizations. This work offers the first comprehensive review of HPCG optimizations, aiming to discuss the previous findings and provide a systematic analysis for further optimizations in the future. Our study aims to guide researchers in identifying the most suitable directions to expand their knowledge and develop further optimization strategies in the HPCG benchmark.
- New
- Research Article
- 10.1029/2024ms004884
- Nov 1, 2025
- Journal of Advances in Modeling Earth Systems
- Aaron Lattanzi + 10 more
Abstract High performance computing (HPC) architectures have undergone rapid development in recent years. As a result, established software suites face an ever increasing challenge to remain performant on and portable across modern systems. Many of the widely adopted atmospheric modeling codes cannot fully (or in some cases, at all) leverage the acceleration provided by General‐Purpose Graphics Processing Units, leaving users of those codes constrained to increasingly limited HPC resources. Energy Research and Forecasting (ERF) is a regional atmospheric modeling code that leverages the latest HPC architectures, whether composed of only Central Processing Units (CPUs) or incorporating GPUs. ERF contains many of the standard discretizations and basic features needed to model general atmospheric dynamics. The modular design of ERF provides a flexible platform for exploring different physics parameterizations and numerical strategies. ERF is built on a state‐of‐the‐art, well‐supported, software framework (AMReX) that provides a performance portable interface and ensures ERF's long‐term sustainability on next generation computing systems. This paper details the numerical methodology of ERF, presents results for a series of verification/validation cases, and documents ERF's performance on current HPC systems. The roughly 5× speed up of ERF (using GPUs) over Weather Research and Forecasting (CPUs only) for a 3D squall line test case highlights the significance of leveraging GPU acceleration.
- New
- Research Article
- 10.1016/j.neunet.2025.107789
- Nov 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Hangming Zhang + 3 more
Combining aggregated attention and transformer architecture for accurate and efficient performance of Spiking Neural Networks.
- New
- Research Article
- 10.1145/3774418
- Nov 1, 2025
- ACM Transactions on Architecture and Code Optimization
- Mathys Eliott Jam + 5 more
Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime, depending on the input and environment. However, finding optimized configurations for each input and environment is challenging and requires significant manual effort and computational resources. This paper presents MLKAPS, a tool that automates this task using machine learning and adaptive sampling techniques. MLKAPS generates decision trees that tune HPC kernels’ design parameters to achieve efficient performance for any user input. MLKAPS scales to large input and design spaces, outperforming similar state-of-the-art auto-tuning tools in tuning time and mean speedup. We demonstrate the benefits of MLKAPS on the highly optimized Intel®MKL dgetrf LU kernel and show that MLKAPS finds blindspots in the manual tuning of HPC experts. It improves over \(85\% \) of the inputs with a geomean speedup of × 1.31. On the Intel®MKL dgeqrf QR kernel, MLKAPS improves performance on \(85\% \) of the inputs with a geomean speedup of × 1.18.
- New
- Research Article
- 10.1016/j.cmpb.2025.108994
- Nov 1, 2025
- Computer methods and programs in biomedicine
- Wonjin Choi + 2 more
Developing a reduced order model for pulsatile blood flow simulations using minimal three-dimensional simulation data.
- New
- Research Article
- 10.3390/civileng6040058
- Oct 31, 2025
- CivilEng
- Mu’Tasim Abdel-Jaber + 2 more
The incorporation of Basalt Fiber-Reinforced Polymer (BFRP) materials marks a significant advancement in the adoption of sustainable and high-performance technologies in structural engineering. This study investigates the flexural behavior of four-meter, two-span continuous reinforced concrete (RC) beams of low and medium compressive strengths (20 MPa and 32 MPa) strengthened or rehabilitated using near-surface mounted (NSM) BFRP ropes. Six RC beam specimens were tested, of which two were strengthened before loading and two were rehabilitated after being preloaded to 70% of their ultimate capacity. The experimental program was complemented by Finite Element Modeling (FEM) and analytical evaluations per ACI 440.2R-08 guidelines. The results demonstrated that NSM-BFRP rope application led to a flexural strength increase ranging from 18% to 44% ductility by approximately 9–11% in strengthened beams and 13–20% in rehabilitated beams, relative to the control specimens. Load-deflection responses showed close alignment between experimental and FEM results, with prediction errors ranging from 0.125% to 7.3%. This study uniquely contributes to the literature by evaluating both strengthening and post-damage rehabilitation of continuous RC beams using NSM-BFRP ropes, a novel and eco-efficient retrofitting technique with proven performance in enhancing structural capacity and serviceability.
- New
- Research Article
- 10.3390/electronics14214235
- Oct 29, 2025
- Electronics
- Hayong Jeong + 3 more
In modern high-performance computing (HPC) and large-scale data processing environments, the efficient utilization and scalability of memory resources are critical determinants of overall system performance. Architectures such as non-uniform memory access (NUMA) and tiered memory systems frequently suffer performance degradation due to remote accesses stemming from shared data among multiple tasks. This paper proposes LACX, a shared data migration technique leveraging Compute Express Link (CXL), to address these challenges. LACX preserves the migration cycle of automatic NUMA balancing (AutoNUMA) while identifying shared data characteristics and migrating such data to CXL memory instead of DRAM, thereby maximizing DRAM locality. The proposed method utilizes existing kernel structures and data to efficiently identify and manage shared data without incurring additional overhead, and it effectively avoids conflicts with AutoNUMA policies. Evaluation results demonstrate that, although remote accesses to shared data can degrade performance in low-tier memory scenarios, LACX significantly improves overall memory bandwidth utilization and system performance in high-tier memory and memory-intensive workload environments by distributing DRAM bandwidth. This work presents a practical, lightweight approach to shared data management in tiered memory environments and highlights new directions for next-generation memory management policies.
- New
- Research Article
- 10.54254/2755-2721/2026.ka28740
- Oct 28, 2025
- Applied and Computational Engineering
- Kunyang Pan + 1 more
Transistors are at the core of electronic technology, but traditional silicon-based transistors are facing performance bottlenecks. Carbon nanotubes (CNTs), with their excellent electrical properties and nanoscale dimensions, have become the ideal material for the next generation of transistors. This article reviews the fundamental characteristics of carbon nanotube semiconductor materials, their significant advantages over traditional silicon-based materials, the progress in transistor research and their application fields, as well as the existing challenges and problems faced by carbon nanotube transistors. The paper mainly introduces the preparation process of carbon nanotubes, the innovation of device structure, and the optimization strategies for electrical properties. In addition, the article also explores the application potential of carbon nanotube transistors in fields such as integrated circuits, radio frequency electronics, display technology, high-performance computing, and sensors. Finally, this paper points out that the issues of purity control, high-frequency performance bottleneck, and integration uniformity still need to be further addressed. It also looks forward to the improvement methods and future development directions for the difficulties it faces, providing a reference for the practical application of carbon nanotube transistors.
- New
- Research Article
- 10.1364/oe.577010
- Oct 27, 2025
- Optics Express
- Ang Li + 13 more
The exponential growth of data center traffic, driven by artificial intelligence (AI) and high-performance computing, demands optical interconnect solutions that overcome the limitations of current packaging integration methods. The conventional bonding process often suffers from substantial parasitic effects, which degrade signal integrity and limit both bandwidth scalability and energy efficiency. Here, we present a monolithically integrated electronic-photonic transceiver fabricated for a 45 nm CMOS-SOI platform, featuring a co-designed Mach-Zehnder modulator (MZM), driver amplifier, Ge-Si photodetector (PD), and transimpedance amplifier (TIA) within a single chip. By eliminating bonding interfaces in optoelectronic integration, the transmitter achieves a 64 Gbaud four-level pulse amplitude modulation (PAM-4) data transmission below the 5.8% overhead hard-decision (HD) forward error correction (FEC) bit error rate (BER) threshold of 3.8 × 10 −3 , while the receiver achieves a 64 Gbaud PAM-4 data transmission below the 6.7% overhead KP4-FEC threshold of 2.4 × 10 −4 . The integrated tranceivers consume the total power consumption of 3.07 pJ/bit at 128 Gb/s. This work highlights the potential of silicon-based monolithic optoelectronic integration techniques for high-speed optical communication and interconnection, offering remarkable enhancements in system performance and scalability.
- New
- Research Article
- 10.1177/10943420251351424
- Oct 26, 2025
- The International Journal of High Performance Computing Applications
- Andreas Herten + 2 more
The field of High-Performance Computing (HPC) is defined by providing computing devices with highest performance for a variety of demanding scientific users. The tight co-design relationship between HPC providers and users propels the field forward, paired with technological improvements, achieving continuously higher performance and resource utilization. A key device for system architects, architecture researchers, and scientific users are benchmarks, allowing for well-defined assessment of hardware, software, and algorithms. Many benchmarks exist in the community, from individual niche benchmarks testing specific features, to large-scale benchmark suites for whole procurements. We survey the available HPC benchmarks, summarizing them in table form with key details and concise categorization, also through an interactive website. For categorization, we present a benchmark taxonomy for well-defined characterization of benchmarks.
- New
- Research Article
- 10.1177/15485129251349540
- Oct 26, 2025
- The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology
- Paul A Clement + 3 more
During the age of above-ground nuclear weapons testing, the effect of vegetation on radiation energy deposition in soil was computationally too challenging to study. Today, improvements in high-performance computing, mature and accurate radiation transport models, and user-friendly optimization and statistical analysis software packages enable investigations into modeling and quantifying the impact of forest vegetation on the prompt gamma-ray energy deposition within the soil from an atmospheric nuclear weapon detonation. Our approach simulates radiation transport, amasses results, and statistically analyzes the results using CUBIT ® , Dakota, and MCNP ® to perform meshing, geometry creation, and radiation transport. Depending on the forest parameters, there is 0.16% --1.3% change in photon energy deposition in the soil. This research successfully demonstrates a methodology for streamlining a complex radiation modeling effort across multiple codes to quantify and answer a nuclear weapons effects question.
- New
- Research Article
- 10.1134/s1063779625700388
- Oct 25, 2025
- Physics of Particles and Nuclei
- A Mirzoyan + 3 more
Armenian National Supercomputing Center: Bridging Science and Technology through High-Performance Computing
- New
- Research Article
- 10.1177/10943420251384688
- Oct 24, 2025
- The International Journal of High Performance Computing Applications
- Alexandre Dutka + 3 more
High-fidelity computational fluid dynamics (CFD) enables the study of complex and subtle fluid dynamics phenomena, but remains to this day very computationally expensive. Therefore, being able to take advantage of all the raw compute power provided by high-performance computing (HPC) hardware evolutions such as the rise of GPU computing is key to making high-fidelity CFD more affordable. However, considering the diverse and fast-evolving HPC hardware landscape, long-term sustainability and software maintainability can rapidly be compromised. The use of adequate numerical methods is also key to reduce the computational cost, and discontinuous high-order methods which combine geometric flexibility and efficient hardware use in an increasingly bandwidth-bound HPC landscape, are very promising in this regard. This work reports the implementation of such a high-order CFD solver using the open source library Kokkos to address the performance portability and sustainability issues. Performance is investigated over a broad range of CPU and GPU architectures, demonstrating the relevance of the approach. This work also highlights the fitness of the chosen numerical method to achieve high orders of accuracy without compromising performance nor scalability.
- New
- Research Article
- 10.1080/02533839.2025.2565361
- Oct 24, 2025
- Journal of the Chinese Institute of Engineers
- Yihong Wang + 3 more
ABSTRACT With the rapid development of high-performance computing, the demand for interconnection network (IN) performance is also increasing. Reliability is an important indicator for evaluating the nature of the IN. An IN with good performance (e.g. with maximal connectivity, maximal diagnosability, low diameter) can be designed to greatly improve information transmission and reduce IN costs. In this paper, we propose a multi-graph matching composition network (MMCN). Furthermore, several important parameters of MMCNs including connectivity, diagnosability, and diameter are characterized. Specifically, we prove the connectivity, diagnosability, upper bound of diameter, 1-extra connectivity, 2-extra connectivity, 1-extra conditional diagnosability, and 2-extra conditional diagnosability of MMCNs. Finally, we apply our results to a number of well-known INs including star graphs, pancake graphs, and some results for unknown INs.
- New
- Research Article
- 10.54254/2753-8818/2026.hz28300
- Oct 23, 2025
- Theoretical and Natural Science
- Wanchen Wang
Gate-stacked double-gate (DG) MOSFETs, featuring a thin SiO interfacial layer combined with high-k dielectrics, improve electrostatics, suppress leakage, and mitigate short-channel effects, enhancing the performance implication. They are promising for low-power electronics, high-performance computing, and biosensing. Conventional MOSFET scaling faces critical bottlenecks, as high-k dielectrics alone suffer from leakage and interface issues. At the same time, structural innovations such as FinFETs cannot fully suppress short-channel effects at advanced nodes. GAA demonstrates good performance, but costs excessively. This work innovatively proposes co-optimizing materials (AlO, HfO, LaO) and structures (strain engineering, dual-material gates, multigate topologies) in gate-stacked double-gate (DG) MOSFETs, integrating high-k stacks with multigate architectures to reinforce electrostatics and scalability. Such synergy ensures enhanced performance while meeting the dual demands of low-power electronics, high-performance computing, and emerging biosensing applications.
- New
- Research Article
- 10.54254/2753-8818/2025.dl28336
- Oct 23, 2025
- Theoretical and Natural Science
- Mingze Sun
The rapid expansion of the Internet of Things (IoT) and connected objects has revealed significant security vulnerabilities in secure data transmission, device authentication, and privacy protection, especially in resource-constrained environments. This paper provides an in-depth look at the application of elliptic curve cryptography (ECC) as a critical cryptographic solution to address these challenges. The paper explores the mathematical foundations underlying ECC, including the fundamental concepts of elliptic curves and the elliptic curve discrete logarithm problem. The paper also discusses the practical application of ECC to IoT security, focusing on robust device authentication, secure data transmission and storage, and improved privacy protection mechanisms. This analysis highlights the inherent benefits of ECC, such as high security thanks to short keys, high computational performance, and reduced communication overhead, while also addressing challenges such as implementation complexity and standardization. Finally, this paper provides insights into selecting the appropriate ECC for various IoT scenarios and discusses future research directions, including the integration of quantum-safe cryptography.