- New
- Research Article
- 10.1007/s42514-025-00269-4
- Jan 23, 2026
- CCF Transactions on High Performance Computing
- Shaofeng Yang + 4 more
Abstract We have optimized the parallel threshold ILU algorithm (ParILUT) for GPUs. The optimizations are for three building blocks: candidate search and ILU residual computation, adding and removing elements, and threshold selection. Firstly, we fuse candidate search and ILU residual computation by modifying the ParILUT algorithm and extending the register-aware SpGEMM algorithm to calculate it. At the same time, we developed a GPU bin search algorithm to make the register-aware SpGEMM algorithm perform better in ParILUT. Secondly, we adopt a warp-row-parallel approach to add elements to new L and U and remove elements from candidates instead of the thread-row-parallel approach. And used the efficient GPU instructions to locate the positions of elements. Thirdly, we proposed a balanced classification tree in the threshold selection to balance the buckets’ data, when a large number of elements with the same value. Finally,we experimented with the performance of each optimization and the whole ParILUT. And verified the correctness of the optimized ParILUT. The result indicates that the optimized ParILUT average speedup is 4.03 times over the original version, and the speedup increases with the amount of fill-in.
- New
- Research Article
- 10.1007/s42514-025-00266-7
- Jan 16, 2026
- CCF Transactions on High Performance Computing
- Yiming Lu + 10 more
Abstract High-performance computing (HPC) systems must remain stable and reliable to consistently deliver robust computational power and ensure the proper execution of user jobs. Anomaly detection is a key means to ensure the stability and reliability of these systems. With the expansion of HPC systems and changes in their architecture, accurately identifying anomalies in dynamic environments has become increasingly challenging. Traditional detection methods rely on experience and rules, which could be inefficient and inaccurate. To address these issues, researchers have proposed machine learning-based methods to automatically process large amounts of complex data, improving the efficiency of anomaly identification and diagnosis. In this survey, we conduct a comprehensive and in-depth investigation of machine learning-based anomaly detection methods in HPC systems. Firstly, we summarize and introduce the background and challenges of anomaly detection in HPC systems. Secondly, we compare a series of machine learning-based anomaly detection works in detail and summarize their frameworks. We conclude their advantages and disadvantages and application scenarios. Finally, we discuss several promising development trends of machine learning-based HPC system anomaly detection.
- Research Article
- 10.1007/s42514-025-00265-8
- Dec 22, 2025
- CCF Transactions on High Performance Computing
- Yonghua Hu + 3 more
- Research Article
- 10.1007/s42514-025-00268-5
- Dec 17, 2025
- CCF Transactions on High Performance Computing
- Gen Zhang + 5 more
Abstract As high performance computing (HPC) moves towards exascale, storage systems face core challenges such as data flooding, bandwidth bottlenecks, mixed load coordination, and performance cost balancing. This article systematically reviews the cutting-edge technologies of high performance storage systems, covering four aspects: storage architecture, hardware, software, and networking. At the architecture level, storage computing separation, distributed and hierarchical architectures decouple computing and storage resources, and optimize latency and scalability through high-speed networks. Typical cases include supercomputer systems such as Frontier and Fugaku. In terms of hardware, persistent memory, all flash array, and integrated storage and computing chips significantly improve throughput and reduce latency, while ZNS SSD and QLC technology optimize cost and lifespan. At the software level, distributed parallel file systems respond to massive small files and high concurrency access through burst buffering technology. In network communication, low latency protocols such as Slingshot, InfiniBand, and RoCE support TB level bandwidth, while CXL technology promotes storage resource pooling. In the future, photon interconnection, AI native architecture, and green energy-saving technologies will further promote the development of high performance storage towards efficiency and intelligence, to support ZB level storage requirements in scenarios such as Exascale computing and AI training.
- Research Article
- 10.1007/s42514-025-00260-z
- Dec 8, 2025
- CCF Transactions on High Performance Computing
- Longkun Guo + 2 more
- Research Article
- 10.1007/s42514-025-00258-7
- Dec 3, 2025
- CCF Transactions on High Performance Computing
- De Dong + 5 more
- Research Article
- 10.1007/s42514-025-00264-9
- Dec 1, 2025
- CCF Transactions on High Performance Computing
- Jinfang Jia + 3 more
- Research Article
- 10.1007/s42514-025-00259-6
- Nov 28, 2025
- CCF Transactions on High Performance Computing
- Renqian Wan + 2 more
- Research Article
- 10.1007/s42514-025-00261-y
- Nov 28, 2025
- CCF Transactions on High Performance Computing
- Jiandong Shang + 7 more
- Research Article
- 10.1007/s42514-025-00252-z
- Nov 20, 2025
- CCF Transactions on High Performance Computing
- Runyu Zhou + 6 more