Articles published on Parallel computing
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
16921 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.camwa.2026.01.029
- Apr 1, 2026
- Computers & Mathematics with Applications
- Jian Sun + 1 more
An enhanced MQRBF-FD method with parallel computing and multiscale modeling for efficient elastic wave propagation
- Research Article
- 10.3847/1538-4357/ae422a
- Mar 9, 2026
- The Astrophysical Journal
- Kazutaka Kimura + 4 more
Abstract We present a radiation-hydrodynamics (RHD) scheme that enables 3D simulations resolving both protostellar interiors and their surrounding accretion flows within a single framework, to clarify how a protostar evolves while interacting with the accretion flow. The method builds on an explicit two-moment M1 closure scheme with a reduced speed of light approximation (RSLA) for massively parallel computation. Our scheme introduces a complementary non-RSLA radiation component that dominates in optically thick regions. This hybrid treatment restores physical energy conservation inside protostars, which would otherwise be violated under the RSLA, while retaining the advantage of large time steps. To overcome the limitation of the conventional M1 closure in solving radiative transfer in extremely optically thick regions inside protostars and across steep optical-depth gradients near their surfaces, we incorporate the optical-depth information of neighboring cells into the radiative transfer calculation. We further evolve photon-number densities in addition to radiation energy densities to reconstruct an effective local spectrum on the fly without resorting to costly multifrequency transport. We implement this scheme in the adaptive mesh refinement code SFUMATO and verify its validity through a series of test calculations. As an application, we follow the early evolution of a massive protostar formed at high redshift, within a full cosmological context. The results reveal a continuous structure connecting the swollen protostar and its surrounding disk, which cannot be captured in conventional 1D models. This RHD scheme opens a path to studies of protostellar evolution and its interaction with the accretion flow in realistic 3D environments.
- Research Article
- 10.46298/lmcs-22(1:16)2026
- Feb 27, 2026
- Logical Methods in Computer Science
- Luidnel Maignan + 1 more
On the one side, the formalism of Global Transformations comes with the claim of capturing any transformation of space that is local, synchronous and deterministic. The claim has been proven for different classes of models such as mesh refinements from computer graphics, Lindenmayer systems from morphogenesis modeling and cellular automata from biological, physical and parallel computation modeling. The Global Transformation formalism achieves this by using category theory for its genericity, and more precisely the notion of Kan extension to determine the global behaviors based on the local ones. On the other side, Causal Graph Dynamics describe the transformation of port graphs in a synchronous and deterministic way and has not yet being tackled. In this paper, we show the precise sense in which the claim of Global Transformations holds for them as well. This is done by showing different ways in which they can be expressed as Kan extensions, each of them highlighting different features of Causal Graph Dynamics. Along the way, this work uncovers the interesting class of Monotonic Causal Graph Dynamics and their universality among General Causal Graph Dynamics.
- Research Article
- 10.1002/mp.70347
- Feb 26, 2026
- Medical physics
- Dejan Kuhn + 5 more
Current VMAT planning workflows for prostate cancer primarily depend on conventional dose-volume criteria specified at discrete dose or volume points. These point-based objectives, however, do not necessarily lead to globally optimal, patient-specific treatment plans. While radiobiological models such as Tumor Control Probability (TCP) and Normal Tissue Complication Probability (NTCP) can provide more meaningful, individualized targets, previous implementations have either employed these for plan evaluation or integrated biological objectives without providing a comprehensive set of deliverable trade-off plans. To date, no prescription-free, automated VMAT planning method has been introduced that generates clinically deliverable, patient-specific Pareto fronts that are biologically interpretable and useful for radiobiological trade-off analysis. The purpose of this study was to develop and clinically evaluate a fully automated, prescription-free VMAT planning framework for primary prostate cancer that generates Pareto-optimal, clinically deliverable treatment plans in radiobiological objective space, constrained by predefined TCP and NTCP levels. The proposed framework was implemented within a commercial treatment planning system (TPS). 17 patients with unfavorable intermediate-risk prostate cancer were retrospectively selected for evaluation. For each patient, TCP and NTCP levels were predefined for three target volumes and seven organs at risk (OARs), restricting the optimization to clinically meaningful regions of the solution space. Plan optimization was performed using Particle Swarm Optimization (PSO) to iteratively adjust VMAT parameters, with the complication-free tumor control probability (P+) serving as the sole objective function. All resulting, clinically deliverable plans were generated in the TPS and subsequently analyzed in the bi-objective radiobiological space defined by injury probability (PI) versus one minus the benefit probability (1 - PB). The plan yielding the highest P+ and the corresponding individualized pseudo-Pareto front were identified for each patient. The proposed method was benchmarked against clinical moderately hypofractionated simultaneous integrated boost (SIB) plans. The proposed prescription-independent planning approach successfully generated individualized pseudo-Pareto fronts for all 17 patients in the radiobiological space of PI versus (1 - PB). This enabled clinicians to visualize and interpret trade-offs between tumor control and normal tissue complication risk within the predefined TCP and NTCP levels. For each patient, the plan with highest P+ achieved superior predicted tumor control and reduced normal tissue toxicity compared to manually optimized clinical plans. The method effectively individualized dose distributions according to patient-specific anatomy and tumor biology, without reliance on fixed dose prescriptions or conventional constraints. All highest P+ treatment plans fulfilled the clinical dose requirements. Sensitivity analyses demonstrated robustness of the framework with respect to variations in TCP model parameters. This study demonstrated the feasibility of a fully automated, prescription-free VMAT planning framework for primary prostate cancer, indicating its potential for future clinical implementation. The proposed framework directly optimized treatment plans in radiobiological objective space, producing Pareto-optimal, clinically deliverable solutions using predefined TCP and NTCP levels. It enables patient-specific trade-off analysis taking into account tumor control and normal tissue complication risk. The work provides a foundation for further development, including the incorporation of geometric uncertainties, acceleration through parallel or GPU-based computation, and application to additional tumor sites.
- Research Article
- 10.57237/j.cst.2026.01.003
- Feb 26, 2026
- Computer Science and Technology
- Bao Zhenhua
University major evaluation is a systematic evaluation of the professional construction, teaching quality, and talent training in colleges and universities, aiming to promote construction of professional connotation and improve the quality of talent training. The evaluation is usually organized by government education authorities, third-party institutions, or colleges and universities independently, adopting combination of quantitative and qualitative methods, mainly based on a professional evaluation system that relies on static analysis. The social participation is relatively low. With the comprehensive and in-depth integration of-education integration and science-education integration into the teaching and scientific research work of colleges and universities, the professional construction and development of colleges and universities need more participation from various sectors of. In order to better verify the school-running strength and school-running level of colleges and universities, it is very necessary to construct a professional evaluation system with the participation of subjects, such as government education authorities, colleges and universities, third-party evaluation institutions, and industry enterprises. This kind of multi-party collaborative working mode is a very complex systematic, which not only involves the acquisition of various different information resources and massive data storage and computing, but also needs a complete set of data classification processing and information processing technology. By using "AI + Big Data" technology and adopting a multi-source distributed parallel computing model, it can effectively solve the storage and computing of various information data and greatly avoid the intervention of human in the evaluation process, which is a powerful guarantee for the extensive participation in the evaluation process and the fairness and justice of the evaluation results. 高校专业评估是对高校开设的专业在办学条件、教学质量、人才培养等方面进行的系统性评价,旨在促进专业内涵建设、提升人才培养质量。评估通常由政府教育主管部门、第三方机构或高校自主组织,采用定量与定性相结合的方法,主要是依据基于静态分析的专业评估体系进行,社会参与度较低。随着产教融合、科教融汇全面深入到高校教科研工作中,高校的专业建设与发展更需要社会多方参与。为更好地验证高校的办学实力与办学水平,构建一个由政府教育主管部门、高校、第三方评估机构、行业企业等多主体参与的专业评估体系十分必要。这种多方协同工作模式是一项极为复杂的系统性工程,不仅涉及到各种不同的信息资源获取和海量的数据存储计算,更需要一套完整的数据分类处理与信息加工技术。利用“AI+大数据”技术,采用多源分布式并行计算模式,可有效解决各种信息数据的存储与计算,并能极大地避免人为因素对评估过程的干预,有力保证评估过程的广泛参与和评估结果的公平公正。
- Research Article
- 10.3390/en19051178
- Feb 26, 2026
- Energies
- Qilong Dong + 6 more
The high-temperature engine nozzle is a critical component of a rocket motor, and its stability and performance are significantly influenced by internal high-temperature gas radiative heat transfer. Due to the non-gray nature of the nozzle medium and the complexity of the Radiative Transfer Equation (RTE), rapid and accurate simulation of radiative heat transfer is crucial for engineering applications. This paper presents a high-efficiency solution coupling the Full-Spectrum Correlated k-Distribution (FSCK) model with the Null-Collision Monte Carlo Method (NCMCM). To address the inherent computational bottleneck of linear traversal in unstructured grids, a hybrid ray-localization model integrating KD-tree and Bounding Volume Hierarchy (BVH) is proposed. This model shifts the search mechanism from element-wise iteration to spatial topological indexing, achieving logarithmic search complexity and significantly mitigating the sensitivity of computational cost to grid scale. Furthermore, a collaborative MPI–OpenMP parallel framework is established to maximize hardware utilization, where an optimized guided scheduling strategy effectively counteracts the stochastic load imbalances encountered in traditional static schemes. Results indicate that the proposed method reduces the total execution time to approximately 1/4 compared to traditional models. Simulations identify the convergent section as the primary radiation zone, where CO2 contributes less to the radiative source term than H2O under high-temperature conditions.
- Research Article
- 10.1038/s41377-025-02153-w
- Feb 23, 2026
- Light, science & applications
- Linzhi Yu + 3 more
All-optical image processing offers a high-speed, energy-efficient alternative to conventional electronic systems by leveraging the wave nature of light for parallel computation. However, traditional optical processors rely on bulky components, limiting scalability and integration. Here, we demonstrate a compact metasurface-based platform for analog optical computing. By employing double-phase encoding and polarization multiplexing, our approach enables arbitrary image transformations within a single passive nanophotonic device, eliminating the need for complex optical setups or digital post-processing. We experimentally showcase key computational operations, including first-order differentiation, cross-correlation, vertex detection, and Laplacian differentiation. Additionally, we extend this framework to high-resolution complex holography, achieving subwavelength-scale volumetric wavefront control for depth-resolved reconstructions with high fidelity. Our results establish a scalable and versatile approach to computational optics, with applications including real-time image processing, energy-efficient computing, biomedical imaging, high-fidelity holographic displays, and optical data storage, driving the advancement of intelligent optical processors.
- Research Article
- 10.31449/inf.v50i7.8356
- Feb 21, 2026
- Informatica
- Meenu Meenu
In concurrent programming, Software Transactional Memory (STM) provides an efficient mechanism formanaging shared memory in parallel computations, avoiding common issues like locks and deadlocks. Acrucial aspect of STM systems is the implementation of transactional variables (TVars), whichsignificantly influence concurrency levels, execution time, and memory overhead. Two primaryimplementations of TVars—nested and non-nested—present distinct advantages and trade-offs. This studyevaluates and compares the effects of nested and non-nested TVar implementations on STM performance,focusing on concurrency, execution time, rollback complexity, and memory overhead. Using the Haskellprogramming language with STM libraries under GHC version 8.6.5, both implementations weredeveloped and tested on a system with an Intel Core i5-1035G1 CPU @ 1.20 GHz, 8 GB DDR4 RAM, anda 512 GB Intel 660p NVMe SSD running Windows 11 Pro. Each configuration executed multiple depositand withdrawal operations over ten iterations: the non-nested version processed approximately 20 STMoperations in a total time of 2.0 seconds, while the nested version performed about 50 operations in 4.0seconds due to additional nested balance adjustments. Execution time and memory usage were measuredusing Haskell’s runtime and heap profiling tools (+RTS -p -hy). The results demonstrate that nested TVarsimprove concurrency by localizing conflicts within sub-transactions, achieving an average operationalthroughput approximately 25% higher than the non-nested version (0.08 seconds per operation for nestedvs. 0.10 seconds for non-nested) and consuming about 38% less total heap memory (38,792 bytes vs.63,080 bytes). Non-nested TVars provide simpler implementation with slightly faster individual executionbut less effective conflict resolution under high load. These insights can guide developers in optimizingSTM-based systems by selecting appropriate TVar models based on the concurrency demands andcomplexity of their applications.
- Research Article
- 10.1038/s42256-026-01182-3
- Feb 20, 2026
- Nature Machine Intelligence
- Shenglong Zhou + 4 more
Abstract Deep learning models are usually trained with stochastic gradient descent-based algorithms, but these optimizers face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. Here we develop an algorithm called PISA (preconditioned inexact stochastic alternating direction method of multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the challenge of data heterogeneity effectively. Moreover, the algorithmic architecture enables scalable parallel computing and supports various preconditions, such as second-order information, second moment and orthogonalized momentum by Newton–Schulz iterations. Incorporating the last two preconditions in PISA yields two computationally efficient variants: SISA and NSISA. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks and recurrent neural networks, demonstrate superior numerical performance of SISA and NSISA compared with various state-of-the-art optimizers.
- Research Article
- 10.1145/3777419
- Feb 13, 2026
- ACM Transactions on Parallel Computing
- Liubov Evseeva + 4 more
In this study, approaches to the development of interactive Java algorithms intended for dynamic visualization of parallel computational threads were considered. The proposed interactive Java algorithms give an opportunity to create visual graphical representations of parallel processes, their interactions, and data distribution. Within the framework of the research, the key approaches to visualization of parallel computational flows are analyzed, the peculiarities of the applied interactive components are estimated and the methods of integration with existing monitoring and debugging systems are considered. With the help of configurable visualization tools, developers and researchers can observe the evolution of computational threads, evaluate the performance of systems and timely react to changes in the structure of parallel tasks. Implementing algorithms on the Java platform ensures portability, broad applicability, and integration with existing frameworks for high-performance computing. The use of dynamic data structures, thread-safe collections, and parallelism mechanisms implemented in the language and standard libraries allows efficiently processing large amounts of data in real time. In addition, Java Virtual Machine provides profiling tools that can be directly applied to optimize the visualized processes.
- Research Article
- 10.54380/ijrdet0226_13
- Feb 12, 2026
- International Journal of Recent Development in Engineering and Technology
- Sunil Kumawat
Parallel Computing refers to a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved concurrently, thus speeding up computation and increasing efficiency.
- Research Article
- 10.3390/a19020133
- Feb 6, 2026
- Algorithms
- Neeraj Dhanraj Bokde + 2 more
Metaheuristic algorithms have become essential tools for solving complex, high-dimensional, and constrained optimization problems. This paper introduces an adaptive R implementation of the parameter-free Jaya algorithm, enhanced with methodological innovations for both single-objective and multi-objective settings. The proposed framework integrates adaptive population management, dynamic constraint-handling, diversity-preserving perturbations, and Pareto-based archiving, while retaining Jaya’s parameter-free simplicity. These extensions are further supported by parallel computation and visualization tools, enabling scalable and reproducible applications. Benchmark evaluations on standard test functions demonstrate improved convergence accuracy, solution diversity, and robustness compared to the classical Jaya and other baseline algorithms. To highlight real-world applicability, the method is applied to a renewable energy planning problem, where trade-offs among cost, emissions, and reliability are explored. The results confirm that the adaptive Jaya approach can generate well-distributed Pareto fronts and provide practical decision support for energy system design. The main contributions of this work are threefold: (i) the development of an adaptive multi-objective extension of the Jaya algorithm that preserves its parameter-free philosophy while incorporating diversity preservation, dynamic constraint handling, and Pareto-based selection; (ii) a unified and openly available R implementation that integrates methodological advances with parallel computation and visualization, addressing the lack of transparent and reusable MO-Jaya tools in the existing literature; and (iii) a systematic evaluation on benchmark test functions and a renewable energy planning case study, demonstrating competitive convergence, robust Pareto diversity, and practical decision-making insights compared to established methods. By openly releasing the software in R (≥3.5.0), this work contributes both a methodological advance in multi-objective metaheuristics and a transparent tool for applied optimization in engineering and environmental domains.
- Research Article
- 10.1021/acs.jctc.5c01892
- Feb 4, 2026
- Journal of chemical theory and computation
- Daniel F Calero-Osorio + 1 more
We show how to add the effects of residual electron correlation to a reference seniority-zero wave function by transforming the true electronic Hamiltonian into seniority-zero form. The transformation is treated via the Baker-Campbell-Hausdorff (BCH) expansion, and the seniority-zero structure of the reference is exploited to evaluate the first three commutators exactly; the remaining contributions are handled with a recursive commutator approximation, as is typical in canonical transformation methods. By choosing a seniority-zero reference and using parallel computation, this method is practical for small- to medium-sized systems. Numerical tests show high accuracy, with errors ∼10-4 Hartree.
- Research Article
- 10.1101/gr.280940.125
- Feb 3, 2026
- Genome research
- Vikram S Shivakumar + 1 more
Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we introduce MumemtoM (Mumemto Merge), comprising two new partitioning and merging strategies. Both strategies enable highly parallel, memory-efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.
- Research Article
- 10.1371/journal.pone.0342167.r008
- Feb 3, 2026
- PLOS One
- Mohammed Alaa Ala’Anzy + 5 more
Sorting can be approached in two main ways: sequentially and in parallel. In sequential sorting, data is processed in a single-threaded manner, which can be slow for large datasets. However, parallel sorting divides the task across multiple processing units, enabling faster results by processing data simultaneously. Furthermore, Compute Unified Device Architecture (CUDA) technology enables developers to leverage GPU power for general-purpose parallel computing, significantly accelerating tasks like sorting. This paper investigates the GPU-based parallelization of merge sort (MS), quick sort (QS), bubble sort (BS), radix top-k selection sort (RS), and slow sort (SS) presenting optimized algorithms designed for efficient sorting of large datasets using modern GPUs. The primary objective is to evaluate the performance of these algorithms on GPUs utilizing CUDA, with a focus on analyzing both parallel time complexity and space complexity across various data types. Experiments are conducted on four dataset scenarios: randomly generated data, reverse-sorted data, already-sorted data, and nearly-sorted data. Also, the performance of GPU-accelerated implementations is compared with their sequential counterparts to assess improvements in computational efficiency and scalability. Earlier GPU-based generations of this type typically achieved acceleration rates between 2× and 9× over scalar CPU code. With newer GPU enhancements, including parallel-aware primitives and radix- or merge-optimized operations, acceleration rates have seen significant improvement. Our experiments indicate that Radix Sort based on GPUs achieves a significant speedup of approximately 50× (sequential: 240.8 ms, parallel: 4.83 ms) on 10 million random sort elements. Quick Sort and Merge Sort have 97× and 103× speedups, respectively (Quick: 1461.97 ms vs. 15.1 ms; Merge: 2212.33 ms vs. 21.4 ms). Bubble Sort, while significantly improving in parallel (123,321.9 ms to 7377.8 ms for an ≈17× improvement), is considerably worse overall. Slow Sort demonstrates a moderate but consistent acceleration, reducing execution time from 74.07 ms in the sequential version to 3.99 ms on the GPU, yielding an ≈18.6× speedup. These experimental findings confirm that the new single-GPU implementations can get speedups ranging from 17× to over 100×, surpassing the typical gains reported in previous generations and comparable to or over rates of acceleration reported for cutting-edge parallel sorting algorithms in recent studies.
- Research Article
- 10.1007/s10409-025-24922-x
- Feb 1, 2026
- Acta Mechanica Sinica
- Keqin Zhang + 4 more
The three-dimensional meshfree numerical manifold method based on parallel computing
- Research Article
- 10.1088/1742-6596/3174/1/012098
- Feb 1, 2026
- Journal of Physics: Conference Series
- Jiawen Chen + 2 more
A Non-gradient-based topology optimization algorithm based on parallel computing
- Research Article
- 10.1002/advs.202521293
- Feb 1, 2026
- Advanced science (Weinheim, Baden-Wurttemberg, Germany)
- Dong Gue Roe + 10 more
Major breakthroughs in artificial intelligence software have led to significant transformations across various aspects of life. However, hardware development has lagged behind, primarily due to the inherent constraints of the von Neumann architecture. Although neuromorphic devices that utilize biomimetic parallel and analog computations have emerged, they still face limitations in reducing computational load. Therefore, this study proposes a light-voltage dual-modulating synaptic transistor that can significantly lower computational load through device-level computing. This is realized using a hybrid structure of indium-gallium-zinc-oxide and InAs quantum dots, which enable two distinct memory effects ‒ one induced by light and the other by voltage ‒ within a single device. These dual-modulation capabilities are leveraged to demonstrate traffic signal optimization using a Dueling Deep Q-Network, achieving computation performance comparable to ideal software conditions. These findings highlight the potential of the fabricated device for realizing computing systems that require high energy efficiency and computational density.
- Research Article
- 10.1016/j.jpdc.2026.105241
- Feb 1, 2026
- Journal of Parallel and Distributed Computing
- Nitul Dutta + 3 more
Retraction notice to “Deep learning inspired routing in ICN using Monte Carlo Tree Search algorithm” [Journal of Parallel and Distributed Computing 150 (2021) 104–111
- Research Article
1
- 10.1089/3dp.2024.0165
- Feb 1, 2026
- 3D Printing and Additive Manufacturing
- Jae Ryoung Kim + 1 more
This article presents a significant advancement in the field of three-dimensional (3D) printing. We have developed a fast parallel computation algorithm that predicts the optimal orientation of 3D printing using a general-purpose graphic processor unit (GPU). Initially designed for the central processing unit (CPU) version support structure tomography, our algorithm has been successfully adapted for NVIDIA graphic processors and the CUDA toolkit. Despite encountering several challenges, we have achieved a remarkable improvement in calculation speed. Two CPUs and four GPUs of various prices and performances were used for the speed comparison. The high-end GPU showed a surprising multiprocessing performance; a maximum of 16.2 times for Dragon mesh data and 11.2 times on average than the high-end CPU. The proposed method assumes that the input triangle should have a bigger size than the voxel size and a relatively smaller number of triangles, about tens of thousands. Nevertheless, the algorithm has the potential to significantly enhance the efficiency and quality of 3D printing, addressing some constraints and expanding its practical applications.