Articles published on Computational Complexity
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
62748 Search results
Sort by Recency
- New
- Research Article
- 10.1088/1361-6501/ae2cbb
- Jan 8, 2026
- Measurement Science and Technology
- Zicheng Lin + 1 more
Abstract Road damage detection is a critical task for ensuring traffic safety and maintaining infrastructure integrity. While deep learning-based detection methods are now widely adopted, they still face two core challenges: first, the inadequate multi-scale feature extraction capabilities of existing networks for diverse targets like cracks and potholes, leading to high miss rates for small-scale damage; and second, the substantial parameter counts and computational demands of mainstream models, which hinder their deployment for efficient, real-time detection in practical applications. To address these issues, this paper proposes a high-precision and lightweight model, YOLO-Road Orthogonal Compact (YOLO-ROC). We designed a Bidirectional Multi-scale Spatial Pyramid Pooling Fast (BMS-SPPF) module to enhance multi-scale feature extraction and implemented a hierarchical channel compression strategy to reduce computational complexity. The BMS-SPPF module leverages a bidirectional spatial-channel attention mechanism to improve the detection of small targets. Concurrently, the channel compression strategy reduces the parameter count from 3.01M to 0.89M and GFLOPs from 8.1 to 2.6. Experiments on the RDD2022-China Drone dataset demonstrate that YOLO-ROC achieves a mAP50 of 67.6%, surpassing the baseline YOLOv8n by 1.4%. Notably, the recall rate for the small-target D40 category improved by 19%, and the final model size is only 2.0 MB. Furthermore, the model exhibits excellent generalization performance on the RDD2022-China Motorbike dataset.
- New
- Research Article
- 10.1088/1361-6501/ae3022
- Jan 7, 2026
- Measurement Science and Technology
- Hongwei Fan + 2 more
Abstract In response to the low detection accuracy and limited adaptability of algorithms caused by significant differences in target characteristics in multi-source dynamic target detection tasks for coal mine conveyors, this paper proposes a Lite-DSP-YOLO-P2 lightweight intelligent detection algorithm. Firstly, considering the differences in image quality and target properties between coal flow states and personnel unsafety scenarios, a differentiated data augmentation strategy is adopted. For personnel unsafety images, a combination of classical augmentation and GridMask is employed to enhance the detection of occluded targets. For coal flow images, an improved DeblurGAN-v2 is used for deblurring prior to classical augmentation, thereby improving structural clarity and detection accuracy. Secondly, to enhance the algorithm's perception of small targets and complex backgrounds, a P2 detection layer is constructed based on the Dynamic Multi-scale Fusion and Triple Feature Encoder (DyMsF-TFE), which integrates shallow and deep features to improve multi-scale representation capability. Meanwhile, a Lightweight Shared Detail Enhanced-convolutional Detection Head (LSDEcDH) is designed, incorporating Group Normalization (GN) and Detail-enhanced Convolution (DeC) to reduce parameters while improving localization accuracy. Finally, the Layer-Adaptive Magnitude-based Pruning (LAMP) method is applied to compress the model structure, achieving a better balance between detection performance and computational complexity, thus enhancing its applicability and efficiency in resource-constrained environments. Experimental results show the differentiated data augmentation strategy significantly improves the quality and diversity of training samples, and enhances the algorithm's perception and detection performance for multi-source dynamic targets in complex scenarios. The proposed Lite-DSP-YOLO-P2 achieves a precision of 95.2%, recall of 90.9%, mAP0.5 of 95.1%, and mAP0.5:0.95 of 67.0%, which represents improvements of 3.5%, 1.1%, 2.7%, and 1.5%, respectively, compared to the YOLOv8n baseline. Furthermore, the algorithm's parameters are reduced by 24.6%, and after applying LAMP pruning, both the parameters and model size are reduced by 51.9% and 39.7%, respectively, while maintaining detection accuracy.
- New
- Research Article
- 10.1088/2631-8695/ae30cb
- Jan 1, 2026
- Engineering Research Express
- Chandrashekar M Patil + 1 more
Abstract Iris recognition is one of the most reliable biometric identification technique due to the uniqueness and stability of its patterns. Deep learning techniques have since emerged as a powerful approach for developing more accurate and robust iris recognition systems. The real-time implementation of deep architectures for iris recognition presents notable challenges, primarily due to the substantial computational and memory demands. In this paper, we present a novel lightweight deep convolutional neural network architecture to effectively address the trade-off between classification accuracy and computation complexity. The pre-processing pipeline employed in the work is aimed at accurately localizing and segmenting the iris image. The pre-processing pipeline comprises of Circular Hough Transform (CHT) for precise iris localization, occlusion removal for handling eyelids and eyelashes, and Contrast-Limited Adaptive Histogram Equalization (CLAHE) for photometric enhancement. In the proposed deep architecture, the depthwise convolutions efficiently extract spatial features from each input channel independently, significantly reducing computational cost, while pointwise convolutions enable channel-wise information fusion to learn discriminative and compact feature representations. The traditional softmax layer is replaced with an SVM classifier using a Radial Basis Function (RBF) kernel, which enhances non-linear decision boundary learning and generalization capability through the max-margin principle. The proposed model has outperformed state-of-the-art pretrained models with a recognition accuracy of 99.3% and equal error rate (EER) of 0.3% on a multi-source benchmark iris dataset (CASIA and MMU1) and demonstrates strong cross-sensor interoperability. The proposed framework offers a promising solution for real-time iris recognition in applications with limited computational resources.
- New
- Research Article
- 10.1016/j.cmpb.2025.109101
- Jan 1, 2026
- Computer methods and programs in biomedicine
- Haotian Tang + 6 more
Lightweight element-wise product enhanced neural network for efficient arrhythmia detection on embedded devices.
- New
- Research Article
- 10.1109/tpami.2025.3605660
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Yulan Guo + 6 more
Convolutional neural networks are constructed with massive operations with different types and are highly computationally intensive. Among these operations, multiplication operation is higher in computational complexity and usually requires more energy consumption with longer inference time than other operations, which hinders the deployment of convolutional neural networks on mobile devices. In many resource-limited edge devices, complicated operations can be calculated via lookup tables to reduce computational cost. Motivated by this, in this paper, we introduce a generic and efficient lookup operation which can be used as a basic operation for the construction of neural networks. Instead of calculating the multiplication of weights and activation values, simple yet efficient lookup operations are adopted to compute their responses. To enable end-to-end optimization of the lookup operation, we construct the lookup tables in a differentiable manner and propose several training strategies to promote their convergence. By replacing computationally expensive multiplication operations with our lookup operations, we develop lookup networks for the image classification, image super-resolution, and point cloud classification tasks. It is demonstrated that our lookup networks can benefit from the lookup operations to achieve higher efficiency in terms of energy consumption and inference speed while maintaining competitive performance to vanilla convolutional networks. Extensive experiments show that our lookup networks produce state-of-the-art performance on different tasks (both classification and regression tasks) and different data types (both images and point clouds).
- New
- Research Article
1
- 10.1016/j.neunet.2025.107981
- Jan 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yiran Cai + 4 more
Tensorized anchor alignment for incomplete multi-view clustering.
- New
- Research Article
- 10.1109/tpami.2025.3604614
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Enxin Song + 5 more
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific vision tasks. Yet, existing methods either employ complex spatial-temporal modules or rely heavily on additional perception models to extract temporal features for video understanding, performing well only on short videos. For long videos, the computational complexity and memory costs associated with long-term temporal connections are significantly increased, posing additional challenges. Leveraging the hierarchical memory structure of the Atkinson-Shiffrin memory model, with tokens in Transformers being employed as the carriers of memory in combination, we propose MovieChat within a training-free memory consolidation mechanism to overcome these challenges, which transfers dense frames from short-term memory into sparse tokens in long-term memory by temporally merging adjacent frames. We lift pre-trained large multi-modal models for understanding long videos without additional trainable modules, employing a zero-shot approach. Additionally, in our new version, MovieChat+, we design an enhanced training-free vision-question matching-based memory consolidation mechanism to better anchor predictions to relevant visual content. MovieChat achieves state-of-the-art performance in long video understanding, along with the released MovieChat-1 K benchmark with 1 K long video, 2 K temporal grounding labels, and 14 K manual annotations.
- New
- Research Article
- 10.1016/j.marpolbul.2025.118537
- Jan 1, 2026
- Marine pollution bulletin
- Conggong Lin + 2 more
Lightweight underwater debris detection model based on improved RT-DETR.
- New
- Research Article
- 10.1109/tpami.2025.3605239
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Ye Li + 7 more
The troublesome model size and quadratic computational complexity associated with token quantity pose significant deployment challenges for Vision Transformers (ViTs) in practical applications. Despite recent advancements in model pruning and token reduction techniques speed up the inference speed of ViTs, these approaches either adopt a fixed sparsity ratio or overlook the meaningful interplay between architectural optimization and token selection. Consequently, this static and single-dimension compression often leads to pronounced accuracy degradation under aggressive compression rates, as they fail to fully explore redundancies across these two orthogonal dimensions. Therefore, we introduce PRANCE, a framework which can jointly optimize activated channels and tokens on a per-sample basis, aiming to accelerate ViTs' inference process from a unified data and architectural perspective. However, the joint framework poses challenges to both architectural and decision-making aspects. First, while ViTs inherently support variable-token inference, they do not facilitate dynamic computations for variable channels. To overcome this limitation, we propose a meta-network using weight-sharing techniques to support arbitrary channels of the Multi-Head Self-Attention (MHSA) and Multi-Layer Perceptron (MLP) layers, serving as a foundational model for architectural decision-making. Second, simultaneously optimizing the model structure and input data constitutes a combinatorial optimization problem with an extremely large decision space, reaching up to around $10^{14}$1014, making supervised learning infeasible. To this end, we design a lightweight selector employing Proximal Policy Optimization algorithm (PPO) for efficient decision-making. Furthermore, we introduce a novel "Result-to-Go" training mechanism that models ViTs' inference process as a Markov decision process, significantly reducing action space and mitigating delayed-reward issues during training. Additionally, our framework simultaneously supports different kinds of token optimization methods such as pruning, merging, and sequential pruning-merging strategies. Extensive experiments demonstrate the effectiveness of PRANCE in reducing FLOPs by approximately 50%, retaining only about 10% of tokens while achieving lossless Top-1 accuracy.
- New
- Research Article
- 10.1109/tpami.2025.3603631
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Xiangmin Han + 3 more
High-order correlations, which capture complex interactions among multiple entities, extend beyond traditional graph representations and support a wider range of applications. However, existing neural network models for high-order correlations encounter scalability issues on large datasets due to the substantial computational complexity involved in processing large-scale structures. In addition, long-tailed distributions, which are common in real-world data, result in underrepresented categories and hinder the model's ability to learn effective high-order interaction patterns for rare instances. To address these issues, we introduce a novel framework known as HyperGraph-based High-order Correlation analysis (HGHC) for large-scale long-tailed data classification. Firstly, to tackle the long-tailed distribution problem, HGHC generates synthetic vertices and computes their attributed high-order correlations using an oversampling module inspired by SMOTE, termed HSMOTE, to enhance the representation of tail categories. Secondly, for efficient computational scaling, we treat the data as having two modalities: the structural modality capturing high-order relationships and the feature modality representing individual attributes. We perform computations on both CPU and GPU separately and then fuse the results to achieve a lightweight vertex transformation and aggregation scheme for high-order correlation data. Additionally, we contribute the first benchmark for large-scale long-tailed datasets involving high-order correlations, known as Amazon-LT, which includes multiple datasets with varying imbalance ratios. Our experimental results demonstrate that HGHC achieves state-of-the-art performance in handling high-order correlation analysis issues for large-scale, long-tailed data.
- New
- Research Article
- 10.1109/tnnls.2025.3606750
- Jan 1, 2026
- IEEE transactions on neural networks and learning systems
- Eunho Lee + 1 more
Despite the strong performance of transformers, quadratic computation complexity of self-attention presents challenges in applying them to vision tasks. Linear attention reduces this complexity from quadratic to linear, offering a strong computation-performance tradeoff. To further optimize this, automatic pruning is an effective method to find a structure that maximizes performance within a target resource through training without any heuristic approaches. However, directly applying it to multihead attention is not straightforward due to channel mismatch. In this article, we propose an automatic pruning method to deal with this problem. Different from existing methods that rely solely on training without any prior knowledge, we integrate channel similarity-based weights into the pruning indicator to preserve the more informative channels within each head. Then, we adjust the pruning indicator to enforce that channels are removed evenly across all heads, thereby avoiding any channel mismatch. We incorporate a reweight module to mitigate information loss due to channel removal and introduce an effective pruning indicator initialization for linear attention, based on the attention differences between the original structure and each channel. By applying our pruning method to the FLattenTransformer on ImageNet-1K, which incorporates original and linear attention mechanisms, we achieve a 30% reduction of FLOPs in a near lossless manner. It also has 1.96% of accuracy gain over the DeiT-B model while reducing FLOPs by 37%, and 1.05% accuracy increase over the Swin-B model with a 10% reduction in FLOPs as well. The proposed method outperforms previous state-of-the-art efficient models and the recent pruning methods.
- New
- Research Article
- 10.1016/j.bios.2025.118105
- Jan 1, 2026
- Biosensors & bioelectronics
- Aihua Li + 5 more
Feature extraction and intelligent diagnosis of ECG signals based on KANs and xLSTM.
- New
- Research Article
- 10.1016/j.neunet.2025.108068
- Jan 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Jiuzhou Chen + 2 more
Large-margin Softmax loss using synthetic virtual class.
- New
- Research Article
1
- 10.1016/j.neunet.2025.107978
- Jan 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Mohammad Mahdi Abedi + 2 more
Gabor-enhanced physics-informed neural networks for fast simulations of acoustic wavefields.
- New
- Research Article
- 10.1109/tpami.2025.3599629
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Said Ouala + 5 more
Neural Ordinary Differential Equations (NODEs) serve as continuous-time analogs of residual networks. They provide a system-theoretic perspective on neural network architecture design and offer natural solutions for time series modeling, forecasting, and applications where invertible neural networks are essential. However, these models suffer from slow performance due to heavy numerical solver overhead. For instance, a popular solution for training and inference of NODEs consists in using adaptive step size solvers such as the popular Dormand-Prince 5(4) (DOPRI). These solvers dynamically adjust the Number of Function Evaluations (NFE) as the equation fits the training data and becomes more complex. However, this comes at the cost of an increased number of function evaluations, which reduces computational efficiency. In this work, we propose a novel approach: making the parameters of the numerical integration scheme trainable. By doing so, the numerical scheme dynamically adapts to the dynamics of the NODE, resulting in a model that operates with a fixed NFE. We compare the proposed trainable solvers with state-of-the-art approaches, including DOPRI, for different benchmarks, including classification, density estimation, and dynamical system modeling. Overall, we report a state-of-the-art performance for all benchmarks in terms of accuracy metrics, while enhancing the computational efficiency through trainable fixed-step-size solvers. This work opens up new possibilities for practical and efficient modeling applications with NODEs.
- New
- Research Article
- 10.1148/rycan.250110
- Jan 1, 2026
- Radiology. Imaging cancer
- Chad A Arledge + 3 more
Purpose To develop and evaluate an image-to-image conditional generative adversarial network (cGAN) for translating dynamic contrast-enhanced (DCE) MRI data to vascular pharmacokinetic permeability maps. Materials and Methods Retrospective breast cancer DCE MR images from The Cancer Imaging Archive acquired between April 1996 and January 1998 were used to assess the developed cGAN. The extended Tofts model (ETM) was applied to establish reference standard volume transfer constant (Ktrans) maps. The cGAN was trained to learn relationships between DCE MR data and ETM Ktrans maps. Linear regression was applied to determine agreement between the ETM and cGAN. Logistic regression and paired t tests were used to assess predictive capabilities of pathologic response. Results Twenty DCE MRI scans (n = 2400 sections) from 10 female patients (mean age, 45 years ± 12 [SD]) were analyzed. Computation time was reduced over 1000-fold using the cGAN compared with the ETM. The cGAN Ktrans maps exhibited excellent spatial agreement and high structural similarity to the ETM, with low errors (normalized root mean squared error ≤0.32; normalized mean absolute error ≤0.16) and a strong correlation (R2 ≥ 0.98). Patients with pathologic complete response demonstrated a 60% reduction in cGAN Ktrans (P = .01) after the first cycle of neoadjuvant chemotherapy, closely matching ETM Ktrans (59%, P = .02). In contrast, patients without pathologic complete response showed a modest reduction in cGAN Ktrans (17%, P = .13), still in good agreement with the ETM (15%, P = .19). Percentage of Ktrans change effectively distinguished patients with or without pathologic complete response (C statistic = 1.0) for both models. Conclusion The DCE to pharmacokinetic cGAN offers promise for standardizing pharmacokinetic analysis and reducing computational complexity at DCE MRI. Moreover, this approach demonstrated potential for early prediction of breast cancer responses to neoadjuvant chemotherapy. Keywords: Dynamic Contrast-enhanced MRI, Vascular Permeability, Image-to-Image Conditional Generative Adversarial Network, Breast Cancer, Neoadjuvant Chemotherapy © RSNA, 2025.
- New
- Research Article
- 10.1016/j.neunet.2025.107972
- Jan 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yining Xie + 4 more
Multi-view parallel convolutional network for organ segmentation in mediastinal region on CT images.
- New
- Research Article
- 10.1016/j.media.2025.103816
- Jan 1, 2026
- Medical image analysis
- Peng Li + 2 more
Anatomical structure-guided joint spatiotemporal graph embedding framework for magnetic resonance fingerprint reconstruction.
- New
- Research Article
- 10.1016/j.media.2025.103792
- Jan 1, 2026
- Medical image analysis
- Ziyao Zhang + 5 more
Switch-UMamba: Dynamic scanning vision Mamba UNet for medical image segmentation.
- New
- Research Article
- 10.1007/978-3-032-05162-2_23
- Jan 1, 2026
- Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
- Minheng Chen + 9 more
Understanding the organization of human brain networks has become a central focus in neuroscience, particularly in the study of functional connectivity, which plays a crucial role in diagnosing neurological disorders. Advances in functional magnetic resonance imaging and machine learning techniques have significantly improved brain network analysis. However, traditional machine learning approaches struggle to capture the complex relationships between brain regions, while deep learning methods, particularly Transformer-based models, face computational challenges due to their quadratic complexity in long-sequence modeling. To address these limitations, we propose a Core-Periphery State-Space Model (CP-SSM), an innovative framework for functional connectome classification. Specifically, we introduce Mamba, a selective state-space model with linear complexity, to effectively capture long-range dependencies in functional brain networks. Furthermore, inspired by the core-periphery (CP) organization, a fundamental characteristic of brain networks that enhances efficient information transmission, we design CP-MoE, a CP-guided Mixture-of-Experts that improves the representation learning of brain connectivity patterns. We evaluate CP-SSM on two benchmark fMRI datasets: ABIDE and ADNI. Experimental results demonstrate that CP-SSM surpasses Transformer-based models in classification performance while significantly reducing computational complexity. These findings highlight the effectiveness and efficiency of CP-SSM in modeling brain functional connectivity, offering a promising direction for neuroimaging-based neurological disease diagnosis. Our code is available at https://github.com/m1nhengChen/cpssm.