Published in last 50 years
Articles published on Multi-scale Features
- New
- Research Article
- 10.1016/j.asoc.2025.113632
- Nov 1, 2025
- Applied Soft Computing
- Quanyu E + 3 more
Hierarchical multiscale feature fusion spectral transformer for the generation of medical hyperspectral image
- New
- Research Article
- 10.1016/j.oceaneng.2025.122109
- Nov 1, 2025
- Ocean Engineering
- Jiangfan Feng + 1 more
LMFEN: Lightweight multi-scale feature enhancement network for underwater object detection in AUVs
- New
- Research Article
- 10.1016/j.knosys.2025.114469
- Nov 1, 2025
- Knowledge-Based Systems
- Shuyan Cheng + 5 more
DMFP: Dynamic multiscale feature perturbations for transferable adversarial attacks
- New
- Research Article
- 10.1016/j.neunet.2025.107783
- Nov 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Jinghan Wu + 6 more
MsDUNE: A multi-scale masked temporal fusion framework for speaker-independent lipreading via Dirichlet uncertainty estimation.
- New
- Research Article
- 10.3390/en18215736
- Oct 31, 2025
- Energies
- Wei He + 5 more
Accurate load forecasting of central air conditioning (CAC) systems is crucial for enhancing energy efficiency and minimizing operational costs. However, the complex nonlinear correlations among meteorological factors, water system dynamics, and cooling demand make this task challenging. To address these issues, this study proposes a novel hybrid forecasting model termed IWOA-BiTCN-BiGRU-SA, which integrates the Improved Whale Optimization Algorithm (IWOA), Bidirectional Temporal Convolutional Networks (BiTCN), Bidirectional Gated Recurrent Units (BiGRU), and a Self-attention mechanism (SA). BiTCN is adopted to extract temporal dependencies and multi-scale features, BiGRU captures long-term bidirectional correlations, and the self-attention mechanism enhances feature weighting adaptively. Furthermore, IWOA is employed to optimize the hyperparameters of BiTCN and BiGRU, improving training stability and generalization. Experimental results based on real CAC operational data demonstrate that the proposed model outperforms traditional methods such as LSTM, GRU, and TCN, as well as hybrid deep learning benchmark models. Compared to all comparison models, the root mean square error (RMSE) decreased by 13.72% to 56.66%. This research highlights the application potential of the IWSO-BiTCN-BiGRU-Attention framework in practical load forecasting and intelligent energy management for large-scale CAC systems.
- New
- Research Article
- 10.63367/199115992025103605009
- Oct 31, 2025
- Journal of Computers
- Hsiao-Yu Wang + 2 more
Steel surface defect detection is critical to quality control, yet performance often degrades across datasets due to variations in imaging conditions, resolutions, and annotation styles. We propose a deep learning framework that combines multi-scale feature fusion, metric learning, and few-shot classification to improve cross-dataset robustness under scarce labels. A ResNet-50 backbone with a Feature Pyramid Network captures both fine-grained and high-level patterns across scales. On top of these features, a triplet-loss–based metric learning module enforces intra-class compactness and inter-class separability, mitigating domain shift. To handle rare or newly emerging defects, we employ a prototypical classifier that computes class prototypes from few labeled samples, enabling fast adaptation without extensive retraining. Evaluations on the NEU and Severstal datasets demonstrate superior generalization compared with Faster R-CNN and YOLO-series baselines, achieving higher mean Average Precision (mAP) and faster convergence. In few-shot settings (e.g., 1/5/10-shot), our approach maintains balanced precision–recall and substantially narrows the gap to full-data performance. These results indicate that integrating multi-scale representation learning with discriminative metric embeddings and prototype-based inference provides a scalable, data-efficient solution for reliable steel defect detection across heterogeneous production environments.
- New
- Research Article
- 10.3390/infrastructures10110289
- Oct 31, 2025
- Infrastructures
- Jing Pu + 6 more
Low-light image enhancement in architectural scenes presents a considerable challenge for computer vision applications in construction engineering. Images captured in architectural settings during nighttime or under inadequate illumination often suffer from noise interference, low-light blurring, and obscured structural features. Although low-light image enhancement and deblurring are intrinsically linked when emphasizing architectural defects, conventional image restoration methods generally treat these tasks as separate entities. This paper introduces an efficient and robust Frequency-Space Recovery Network (FSRNet), specifically designed for low-light image enhancement in architectural contexts, tailored to the unique characteristics of such scenes. The encoder utilizes a Feature Refinement Feedforward Network (FRFN) to achieve precise enhancement of defect features while dynamically mitigating background redundancy. Coupled with a Frequency Response Module, it modifies the amplitude spectrum to amplify high-frequency components of defects and ensure balanced global illumination. The decoder utilizes InceptionDWConv2d modules to capture multi-directional and multi-scale features of cracks. When combined with a gating mechanism, it dynamically suppresses noise, restores the spatial continuity of defects, and eliminates blurring. This method also reduces computational costs in terms of parameters and MAC operations. To assess the effectiveness of the proposed approach in architectural contexts, this paper conducts a comprehensive study using low-light defect images from indoor concrete walls as a representative case. Experimental results indicate that FSRNet not only achieves state-of-the-art PSNR performance of 27.58 dB but also enhances the mAP of the downstream YOLOv8 detection model by 7.1%, while utilizing only 3.75 M parameters and 8.8 GMACs. These findings fully validate the superiority and practicality of the proposed method for low-light image enhancement tasks in architectural settings.
- New
- Research Article
- 10.3390/aerospace12110978
- Oct 31, 2025
- Aerospace
- Zhongkang Yin + 5 more
The limited availability of in-situ images of the lunar surface significantly hinders the performance improvement of intelligent algorithms, such as scientific target point-of-interest recognition. To address the low diversity of images generated by traditional data augmentation methods under small-sample conditions, we propose a single-image generative adversarial method based on a blending mechanism of effective channel attention and spatial attention (ECSA-SinGAN). First, an effective channel attention module is introduced to assign different weights to each channel, enhancing the feature representation of important channels. Second, a spatial attention module is employed to assign varying weights to different spatial locations within the image, thereby improving the representation of target regions. Finally, based on a blending mechanism, lunar surface in-situ images are generated step by step, following a pyramidal hierarchy for multi-scale feature extraction. Experimental results show that the proposed method reduces MS-SSIM by 41% compared with SinGAN under identical image quality conditions in the lunar surface in-situ image augmentation task. The method preserves the original image style while significantly improving data diversity, making it effective for small-sample lunar surface in-situ image augmentation.
- New
- Research Article
- 10.54097/xhtpna28
- Oct 31, 2025
- Journal of Computer Science and Artificial Intelligence
- Qing Gan + 4 more
With the rapid development of digital multimedia technology, images, as an important carrier of information dissemination, have been widely applied in fields such as healthcare, security, commerce, and social networking. However, images are highly susceptible to tampering, duplication, and illegal use during transmission and storage, posing severe challenges to their authenticity and integrity. Traditional image authentication techniques exhibit significant deficiencies in terms of security, robustness, and invisibility, making them difficult to meet the increasing security demands. This paper proposes a novel image authentication method that integrates Sparse Approximation (SA) and Quantum Encryption (QE), aiming to enhance the security and anti-attack capabilities of digital images. The method first performs subsampling and sparsification on the watermark image, extracts multi-scale features of the image using Discrete Wavelet Transform (DWT), and generates a highly random measurement matrix through quantum logic mapping to achieve encryption and exchange of sparse coefficients. Subsequently, Singular Value Decomposition (SVD) is employed to embed the encrypted watermark information into the low-frequency components of the host image, ensuring the invisibility and robustness of the watermark. Experimental results demonstrate that the proposed method exhibits excellent performance in resisting noise, geometric transformations, and enhancement attacks. When the correct key is used, the watermark can be accurately recovered, while the use of an incorrect key results in complete distortion of the watermark, effectively preventing illegal extraction. The research presented in this paper provides an efficient and secure technical path for digital image copyright protection and content authentication.
- New
- Research Article
- 10.1080/01431161.2025.2564908
- Oct 31, 2025
- International Journal of Remote Sensing
- Muzi Chen + 9 more
ABSTRACT With the advancement of real-time object detection technology, maintaining high detection accuracy for small objects across multiple scales remains challenging. Conventional convolutional neural networks (CNNs) struggle to effectively capture multi-scale features, often failing to meet detection requirements. This study proposes RMRN-DETR, an optimized remote sensing image detection network based on multi-dimensional real-time detection and domain adaptation. First, we introduce a Multi-dimensional Real-time detection module (MR) to achieve efficient end-to-end accuracy improvement. Second, a Multi-dimensional Domain Adaptation module is proposed to address feature fusion across different scales, effectively capturing both low-level and high-level semantic information in a multi-scale hierarchy. Finally, a novel loss boundary regression module is introduced to enhance bounding box regression accuracy, precisely reflecting the discrepancy between predicted and ground-truth boxes. Experimental results demonstrate a 1.8% accuracy improvement over the baseline on the ROSD dataset and a 2.9% gain on the DIOR dataset. The proposed method significantly enhances the detection accuracy and efficiency of small objects in remote sensing images, demonstrating strong adaptability to complex multi-scale scenarios.
- New
- Research Article
- 10.1371/journal.pone.0333999
- Oct 30, 2025
- PLOS One
- Binbin Tu + 5 more
Automatic recognition of ground-based clouds is crucial for meteorology and especially for the operational safety of Unmanned Aerial Vehicles (UAVs), but it is challenged by variable cloud shapes, complex lighting, and background interference. This paper introduces ALGA-DenseNet, an improved DenseNet model with a multi-scale attention mechanism. The model employs Color Jitter to enhance image robustness and improve learning of intra-class variations and inter-class differences. It incorporates Adaptive Local and Global Attention (ALGA) to merge features, enhancing feature selection. Additionally, it integrates mixed and depthwise separable convolutions to optimize multi-scale feature extraction, reducing parameters and computational complexity. Furthermore, integrating a Vision Transformer (ViT) and Dynamic Multi-head Attention (DMA) enhances representation of complex cloud features. Experimental results show recognition accuracies of 97.94% on the TJNU (Tianjin Normal University) Ground-based Cloud Dataset (GCD) and 97.25% on the Cirrus Cumulus Stratus Nimbus (CCSN) dataset. This indicates the model’s capability for fine-grained, multi-scale extraction of cloud textures, shapes, and color features, along with strong generalization performance.
- New
- Research Article
- 10.1088/2631-8695/ae1281
- Oct 30, 2025
- Engineering Research Express
- Peisheng Sang + 2 more
Abstract Due to the small amount and fragmented distribution of fault data, cross-domain simultaneous fault diagnosis of ship propulsion systems faces significant challenges. To address this issue, this paper proposed a novel hybrid framework, multi-source domain multi-scale joint domain adaptation multi-label classification (MMJ-DAML), for simultaneous fault diagnosis. The framework integrates multi-scale feature extraction to capture characteristics at different scales, multi-source joint domain adaptation to mitigate distribution shifts across operational conditions, and multi-label classification to model complex fault interdependencies. Experimental results on a ship degradation dataset demonstrate that MMJ-DAML achieves an average diagnostic accuracy of over 94% under diverse working conditions and domain adaptations. The study highlights the framework’s strong generalization capability in data-scarce scenarios and provides a practical solution for the simultaneous fault diagnosis of the actual ship propulsion system.
- New
- Research Article
- 10.1088/1361-6501/ae0fb8
- Oct 30, 2025
- Measurement Science and Technology
- Chirong Li + 3 more
Abstract Unmanned aerial vehicles (UAVs) offer a cost-effective and flexible solution for road surface monitoring. However, real-time pavement defect detection from drone perspectives remains challenging due to limited onboard resources and the complex appearance of defects. To address this, this paper proposes Drone’s Pavement Detection Transformer (DP-DETR), a real-time defect detection model based on Real-Time Detection Transformer (RT-DETR). Specifically, a lightweight CSP-ShuffleNetV2 backbone is adopted to enhance efficiency. For accurate detection of diverse defect types, a Dynamic Deformable Crack Perception Network is introduced. Moreover, a Reparameterized Multi-Scale Feature Fusion Architecture (RepMSF) is designed to strengthen multi-scale feature representation. Evaluated on the RDD2022_ChinaDrone dataset, DP-DETR achieves an mAP@50 of 72.3%, while reducing parameters by 40.93% and computation (GFLOPs) by 31.04% compared to the baseline. The model runs at 58.1 FPS, demonstrating a superior balance between detection accuracy and real-time performance.
- New
- Research Article
- 10.1038/s41598-025-21903-9
- Oct 30, 2025
- Scientific Reports
- Mingrong Li + 6 more
The encoder–decoder paradigm has emerged as the prevailing framework in medical image segmentation, and recent studies within this paradigm have demonstrated its remarkable effectiveness for lesion delineation. However, because the encoder compresses high-dimensional inputs and the decoder must reconstruct the target from the encoder’s limited latent representation, a fixed encoder–decoder pipeline inevitably introduces a semantic gap between the two stages. To bridge this gap, we present MAFormer, a novel U-shaped network tailored for medical image segmentation. Specifically, we design a Multi-scale Dependency Feature Construction (MDFC) module that refines the skip-connection pathway to fuse semantic information across hierarchical levels. In addition, we propose an Attention Representation Reinforcement Module (ARRM) that strengthens encoder–decoder semantic alignment via bidimensional similarity computation and a hierarchical masking strategy. Extensive experiments on GlaS, Synapse and ISIC2018 datasets confirm that MAFormer consistently surpasses state-of-the-art encoder–decoder methods on both large and small scale datasets. In particular, it achieves higher Dice scores, underscoring the effectiveness of MAFormer in improving overall segmentation accuracy.
- New
- Research Article
- 10.1088/1361-6560/ae19c9
- Oct 30, 2025
- Physics in medicine and biology
- Sijia Liu + 4 more
Magnetic Particle Imaging (MPI) is an emerging imaging technique based on superparamagnetic iron oxide nanoparticles, offering high sensitivity and rapid imaging. However, in measurement-based MPI, image quality is degraded by noise arising during both the system matrix calibration procedure and the signal acquisition process.. This study aims to develop a deep learning-based model for efficient noise suppression to enhance MPI image quality. 
Approach: We propose a hybrid encoder-decoder network integrating residual blocks (Res-Blocks) and swin transformer modules. The model employs a multi-scale feature extraction strategy to disentangle noise from valid signals, coupled with cross-level feature fusion to optimize frequency-domain recovery. 
Main results: Model performance was evaluated on simulated dataset, OpenMPI dataset, and dataset acquired from in-house MPI systems. The denoised system matrix achieved an average 12 dB improvement in signal-to-noise ratio (SNR). Reconstructed images showed better visual quality, with a peak signal-to-noise ratio (PSNR) of 29.11 dB and a structural similarity index (SSIM) of 0.93, which outperformed the compared approaches. 
Significance: This work provides a robust solution for noise suppression in system matrix to enhance MPI image quality. The noise suppression framework is extensible to other system matrix-based medical imaging modalities.
- New
- Research Article
- 10.1038/s41598-025-21887-6
- Oct 30, 2025
- Scientific Reports
- Ying Han + 6 more
Ship target tracking and detection are essential procedures in the shipping industry that guarantee ship traffic and marine safety. However, issues including complicated background interference, multi-scale object recognition, and inadequate training for small sample recognition are common with classical detection techniques. To address these challenges, this study proposes an enhanced ship multi-target detection model utilizing a modified YOLOv8 algorithm. In addition to integrating the ESSE module and GSConvns technology into the YOLOv8 backbone, the YOLOv8n model acts as the baseline and integrates Wise-IoU technology. This improvement greatly increased multi-scale feature extraction’s efficacy while maintaining the fewest possible parameters. With this change, lightweight fusion is accomplished, and the model’s capacity to extract semantic characteristics from ship photos is enhanced, particularly when it comes to identifying targets against intricate backdrops. According to testing results on the Dockship, Seaships, and Infrared Offshore Ship datasets, the enhanced algorithm’s average detection accuracy is 82.1%, 99.1%, and 91.7%, respectively. This is a considerable improvement over the baseline model. Furthermore, the model’s ability to improve the features, lightweighting, and detecting capabilities of different ship kinds has been validated by IoU computation and ablation experimental analysis. These results highlight how the suggested approach could improve ship target detection’s automation, dependability, and quality.
- New
- Research Article
- 10.1088/1361-6501/ae18ed
- Oct 29, 2025
- Measurement Science and Technology
- Dandan Wang + 5 more
Abstract Polar motion prediction is a core technology supporting geomagnetic navigation and space environment monitoring. It directly impacts application efficacy in critical fields, including spacecraft orbit control, geomagnetic field modelling, and space hazard early warning. Due to simplified assumptions, traditional empirical prediction models suffer from significant accuracy degradation in long-term extrapolation forecasts. At the same time, existing deep learning methods are constrained by their limited ability to capture multi-scale features, making effective modelling challenging. This paper proposes a multi-scale hybrid prediction model named LSVMD+Informer to address these issues. The model innovatively integrates three key techniques: Least Squares (LS) periodic decomposition, Variational Mode Decomposition (VMD) for residual feature extraction, and the Informer method for long-sequence time-series prediction. Combining these approaches constructs a multi-scale feature decoupling and hierarchical temporal modelling framework. This integration effectively resolves the coupling problem of multi-scale information under complex patterns.Experiments were conducted using high-precision observational data from 2002 to 2022, with rolling predictions performed for 2022–2025 data. A three-dimensional error analysis system was established to compare the model with IERS Bulletin A and the LS+Informer baseline model. The results show that LSVMD+Informer outperforms the Bulletin A model with significant accuracy improvements. The PMX achieves an average accuracy gain of 20.20%, reaching up to 28.49% in some cases. Similarly, the average improvement for the PMY is 26.28%, with a maximum increase of 33.35%. The results demonstrate that LSVMD+Informer significantly improves the accuracy and robustness of polar motion prediction. This method exhibits precise capabilities in capturing complex periodic features, proving its effectiveness in modelling intricate temporal patterns.
- New
- Research Article
- 10.3390/en18215686
- Oct 29, 2025
- Energies
- Jia Huang + 6 more
Conventional power load forecasting frameworks face limitations in dynamic spatial topology capture and long-term dependency modeling. To address these issues, this study proposes a hybrid GAT-CNN-LSTM architecture for enhanced short-term power load forecasting. The model integrates three core components synergistically: Graph Attention Network (GAT) dynamically captures spatial correlations via adaptive node weighting, resolving static topology constraints; a CNN-LSTM module extracts multi-scale temporal features—convolutional kernels decompose load fluctuations, while bidirectional LSTM layers model long-term trends; and a gated fusion mechanism adaptively weights and fuses spatio-temporal features, suppressing noise and enhancing sensitivity to critical load periods. Experimental validations on multi-city datasets show significant improvements: the model outperforms baseline models by a notable margin in error reduction, exhibits stronger robustness under extreme weather, and maintains superior stability in multi-step forecasting. This study concludes that the hybrid model balances spatial topological analysis and temporal trend modeling, providing higher accuracy and adaptability for STLF in complex power grid environments.
- New
- Research Article
- 10.1088/1361-6501/ae0e97
- Oct 29, 2025
- Measurement Science and Technology
- Meng Xu + 4 more
Abstract Abstract:Accurate remaining useful life (RUL) prediction of rolling bearings is essential for ensuring the safety and reliability of industrial machinery. However, existing deep learning approaches often struggle to capture nonlinear degradation dynamics and to disentangle evolving spatiotemporal dependencies across multiple degradation modes. To overcome these limitations, this paper proposes a novel Spatio-Temporal Degradation Disentanglement Network (STDD-Net). The framework incorporates an early degradation point detection mechanism that combines statistical thresholding and CEEMD-based energy–autocorrelation analysis to ensure that only meaningful degradation stages are modeled. Multi-scale nonlinear features extracted via CEEMD and compressed by PCA are then processed by a Degradation-Oriented Feature Decomposition (DOFD) module, which explicitly decouples spatial hierarchies and temporal dynamics. The proposed architecture consists of two collaborative branches: one branch performs capsule-based spatial modeling, and the other employs LSTM for temporal sequence modeling. These branches are jointly optimized through an adaptive fusion strategy with uncertainty-aware regression. Extensive experiments on two benchmark datasets demonstrate that STDD-Net consistently outperforms advanced methods, achieving notable reductions in RMSE and MAE, while ablation studies further verify the essential contributions of each submodule. These results highlight that STDD-Net provides a robust and generalizable solution for RUL forecasting under diverse operating conditions.
- New
- Research Article
- 10.3390/biology14111515
- Oct 29, 2025
- Biology
- Shuai Fang + 5 more
To address the challenges posed by complex background interference, varying target sizes, and high species diversity in bird detection tasks in the Dongting Lake region, this paper proposes an enhanced bird detection model named Birds-YOLO, based on the YOLOv11 framework. First, the EMA mechanism is introduced to replace the original C2PSA module. This mechanism synchronously captures global dependencies in the channel dimension and local detailed features in the spatial dimension, thereby enhancing the model’s robustness in cluttered environments. Second, the model incorporates an improved RepNCSPELAN4-ECO module, by reasonably integrating depthwise separable convolution modules and combining them with an adaptive channel compression mechanism, to strengthen feature extraction and multi-scale feature fusion, effectively enhances the detection capability for bird targets at different scales. Finally, the neck component of the network is redesigned using lightweight GSConv convolution, which integrates the principles of grouped and spatial convolutions. This design preserves the feature modeling capacity of standard convolution while incorporating the computational efficiency of depthwise separable convolution, thereby reducing model complexity without sacrificing accuracy. Experimental results show that, compared to the baseline YOLOv11n, Birds-YOLO achieves a 5.0% improvement in recall and a 3.5% increase in mAP@0.5 on the CUB200-2011 dataset. On the in-house DTH-Birds dataset, it gains 3.7% in precision, 3.7% in recall, and 2.6% in mAP@0.5, demonstrating consistent performance enhancement across both public and private benchmarks. The model’s generalization ability and robustness are further validated through extensive ablation studies and comparative experiments, indicating its strong potential for practical deployment in bird detection tasks in complex natural environments such as Dongting Lake.