• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Related Topics

  • Spatial Attention
  • Spatial Attention
  • Top-down Attention
  • Top-down Attention
  • Temporal Attention
  • Temporal Attention
  • Cross-modal Attention
  • Cross-modal Attention
  • Feature-based Attention
  • Feature-based Attention

Articles published on Attentional modulation

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
12095 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1021/acs.jpclett.5c03057
Dual-Path Global Awareness Transformer for Optical Chemical Structure Recognition.
  • Dec 8, 2025
  • The journal of physical chemistry letters
  • Rui Wang + 3 more

Optical chemical structure recognition (OCSR), reconstructing structural information from chemical graphics into machine-readable sequences, has recently emerged as a focal research topic at the intersection of materials science and computer science. Existing multimodal fusion methods possess limited awareness of global context and often produce erroneous sequences when encountering complex motifs, such as rings or long chains. To address this issue, we propose the dual-path global awareness transformer (DGAT) for sequence generation. Its cascaded global feature enhancement (CGFE) module emphasizes global context and bridges cross-modal gaps, while the sparse differential global-local attention (SDGLA) module dynamically captures fine-grained differences between global and local features. We have constructed a new evaluation data set, and DGAT achieves state-of-the-art (SOTA) performance, reaching BLEU-4 = 0.840 (+5.3%), ROUGE-L = 0.908 (+1.9%), and mean Tanimoto similarity = 0.988 (+1.2%) over the best published model, confirming its ability to generate sequences that are both symbolically accurate and chemically precise.

  • New
  • Research Article
  • 10.1177/10775463251406121
Vibration signal analysis and condition monitoring of centrifugal pumps based on time-delay phase space reconstruction and attention mechanism
  • Dec 7, 2025
  • Journal of Vibration and Control
  • Ziheng Tang + 6 more

Centrifugal pumps are the core driving equipment in fluid transport systems, and their operating conditions directly affect the stability and efficiency of the system. Although monitoring changes in physical quantities such as vibration signals can realize the condition monitoring of pump operation, the accuracy of using traditional signal analysis methods to monitor the pump’s working status is relatively low. To address this, this paper proposes a classification method using a convolutional neural network with an attention mechanism on time-delay phase space reconstructed images to improve the accuracy of condition monitoring. The method transforms one-dimensional time series signals into high-dimensional spatial trajectories through time-delay phase space reconstruction, which can provide more detailed feature information for the convolutional neural network with relatively low computational cost, and analyzes the stability of centrifugal pump operation through high-dimensional spatial trajectory analysis. This method combines channel attention mechanisms and spatial attention mechanisms to further enhance the feature extraction capability, enabling the neural network to better capture key features in the pump vibration signals, thereby realizing the recognition and classification of vibration signals under different conditions, and improving the accuracy and robustness of condition monitoring. The results show that in experiments identifying five different types of signals, this method ultimately achieved a test set accuracy of 99.2%, representing an improvement of approximately 7.25% compared to one-dimensional convolution, and a reduction of about 28% in training epochs compared to two-dimensional convolution without attention modules under the same accuracy level.

  • New
  • Research Article
  • 10.1142/s0219720025500167
Early lifespan prediction in Caenorhabditis elegans via contrastive learning and channel attention.
  • Dec 6, 2025
  • Journal of bioinformatics and computational biology
  • Miaomiao Jin + 2 more

Early lifespan prediction in Caenorhabditiselegans faces the challenges of indistinct discriminative signals, subtle and localized key features, difficulty in data annotation, and poor generalization. We propose Contrastive Learning-guided Channel Attention Modulation (CLCAM), in which supervised contrastive learning clusters individuals with the same lifespan and separates different classes. The resulting embedding drives channel-wise gains that are additively coupled to the backbone, thereby amplifying subtle morphological cues. At inference, the contrastive branch is removed, keeping FLOPs essentially unchanged with a modest runtime cost on our hardware. On a public dataset, CLCAM achieves an AUC-ROC of 0.84, showing a consistent improvement over the EfficientNet-B3 baseline (0.82) and a substantial gain over the prior WormNet model (0.61). Grad-CAM indicates attention focused on the pharynx and body-wall musculature, supporting the biological plausibility of the model's decisions. CLCAM offers a clear, low-overhead paradigm for early lifespan phenotyping. CLCAM code is available at https://github.com/JMM502/CLCAM/tree/master/clcam.

  • New
  • Research Article
  • 10.1109/tpami.2025.3640233
Harnessing Lightweight Transformer With Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation.
  • Dec 5, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Xinyu Liu + 4 more

Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%. Code is released at https://github.com/CUHK-AIM-Group/Light-UNETR.

  • New
  • Research Article
  • 10.3390/rs17243939
DLiteNet: A Dual-Branch Lightweight Framework for Efficient and Precise Building Extraction from Visible and SAR Imagery
  • Dec 5, 2025
  • Remote Sensing
  • Zhe Zhao + 5 more

High-precision and efficient building extraction by fusing visible and synthetic aperture radar (SAR) imagery is critical for applications such as smart cities, disaster response, and UAV navigation. However, existing approaches often rely on complex multimodal feature extraction and deep fusion mechanisms, resulting in over-parameterized models and excessive computation, which makes it challenging to balance accuracy and efficiency. To address this issue, we propose a dual-branch lightweight architecture, DLiteNet, which functionally decouples the multimodal building extraction task into two sub-tasks: global context modeling and spatial detail capturing. Accordingly, we design a lightweight context branch and spatial branch to achieve an optimal trade-off between semantic accuracy and computational efficiency. The context branch jointly processes visible and SAR images, leveraging our proposed Multi-scale Context Attention Module (MCAM) to adaptively fuse multimodal contextual information, followed by a lightweight Short-Term Dense Atrous Concatenate (STDAC) module for extracting high-level semantics. The spatial branch focuses on capturing textures and edge structures from visible imagery and employs a Context-Detail Aggregation Module (CDAM) to fuse contextual priors and refine building contours. Experiments on the MSAW and DFC23 Track2 datasets demonstrate that DLiteNet achieves strong performance with only 5.6 M parameters and extremely low computational costs (51.7/5.8 GFLOPs), significantly outperforming state-of-the-art models such as CMGFNet (85.2 M, 490.9/150.3 GFLOPs) and MCANet (71.2 M, 874.5/375.9 GFLOPs). On the MSAW dataset, DLiteNet achieves the highest accuracy (83.6% IoU, 91.1% F1-score), exceeding the best MCANet baseline by 1.0% IoU and 0.6% F1-score. Furthermore, deployment tests on the Jetson Orin NX edge device show that DLiteNet achieves a low inference latency of 14.97 ms per frame under FP32 precision, highlighting its real-time capability and deployment potential in edge computing scenarios.

  • New
  • Research Article
  • 10.1088/1361-6560/ae28b0
MRDT-GAN: generative adversarial network with multi-scale residual dense transformer generator for low-dose CT denoising.
  • Dec 5, 2025
  • Physics in medicine and biology
  • Shikai Guo + 2 more

Low-dose computed tomography (LDCT) reduces radiation exposure but introduces noise and artifacts that degrade diagnostic quality. Existing deep learning-based denoising methods still face challenges such as over-smoothing, loss of fine structures, and uneven contrast. This study aims to develop an LDCT denoising framework that enhances noise suppression while preserving anatomical details and structural fidelity. We propose a Multi-Scale Residual Dense Transformer Generative Adversarial Network (MRDT-GAN). In the generator, we adopt the Multi-Scale Residual Dense Transformer Block (MRDTB) as the core unit, which introduces multi-scale strategy into residual dense network to reduce over-smoothing and preserve fine details, and also Patching Transformer Block (PTB) to capture long-range dependencies, mitigating distortions caused by localized receptive fields in CNN-based approaches. A Hybrid Attention Module (HAM) is also introduced in the generator to process spatial, frequency, and contrast information, enabling the network to focus on critical regions for noise suppression, improve contrast uniformity, and maintain texture consistency. In the discriminator, we adversarially explore differences on global, pixel, and also sub-scale between denoised LDCT and normal dose CT to better capture structural variations, reduce local noise and distortions, and ensure more realistic texture reconstruction while minimizing artifacts. We validate MRDT-GAN on both the NIH-AAPM-Mayo Clinic LDCT dataset and a real-world dataset. Experimental results indicate that MRDT-GAN achieves superior denoising performance compared with existing methods, effectively preserves details, enhances visual quality, and achieves a better balance between noise suppression and structural integrity. MRDT-GAN provides an effective and generalizable LDCT denoising solution that balances noise reduction with fine-detail preservation. By integrating multi-scale residual dense Transformer modeling, hybrid attention mechanisms, and multi-difference adversarial learning, the framework offers improved clinical applicability and supports high-quality image reconstruction for downstream diagnostic tasks.

  • New
  • Research Article
  • 10.1002/ps.70424
Rapid detection of common scab, powdery scab, and enlarged lenticels in potato tubers using deep learning.
  • Dec 5, 2025
  • Pest management science
  • Jiale Lv + 6 more

Differentiating between potato common scab, powdery scab, and the physiological disorder of enlarged corky lenticels is challenging due to their similar visual symptoms. To address this, we propose YOLOv8-ST, an enhanced deep learning model that incorporates the Swin Transformer and Triplet Attention modules to effectively distinguish between these visually similar tuber blemishes. YOLOv8-ST is an enhanced YOLOv8 model with the integration of Triplet Attention and the Swin Transformer, which achieved significant accuracy improvements. Compared to the baseline of YOLOv3, YOLOv5, YOLOv6, and YOLOv8, YOLOv8-ST achieved the highest precision (0.903), recall (0.831), F1-score (0.866), mAP@0.5 (0.931), and mAP@0.5:0.95 (0.616), with strong performance in detecting common scab and powdery scab (both >0.9 at mAP@0.5 or precision). Detection outputs showed higher confidence (e.g., 0.94 for scab), fewer false positives, and no missed lesions, outperforming models prone to misclassification or overlap. The YOLOv8-ST model enables fast, accurate, and reliable detection of common scab, powdery scab, and enlarged lenticels on potato tubers. This field-deployable solution supports early disease diagnosis and timely intervention, thus reducing crop losses. The model is available through the mobile app Plant Guardian, enabling growers to identify potato skin blemishes directly in the field, thereby advancing both practical disease management and agricultural AI applications. © 2025 Society of Chemical Industry.

  • New
  • Research Article
  • 10.1038/s41598-025-30346-1
EfficientPoseSegNet: a weakly supervised, attention-guided framework for human pose estimation, anatomical segmentation, and concealed object detection in backscatter millimeter-wave security screening.
  • Dec 5, 2025
  • Scientific reports
  • Muhammad Zaheer Sajid + 4 more

Detecting concealed objects, anatomical keypoints, and segmenting human body parts in backscatter millimeter-wave images are essential tasks for enhancing airport security and transportation safety. However, the deployment of reliable automated systems is challenged by factors such as poor image quality, scarce detailed annotations, and stringent privacy regulations. To address these issues, we introduce EfficientPoseSegNet, a hybrid deep learning framework designed for efficient annotation use, concealed object detection, and body-aware analysis in security screening images. The model leverages parallel EfficientNet and DenseNet backbones to capture rich multi-scale features from low-resolution scans, refined using a Convolutional Block Attention Module (CBAM) to enhance focus on critical anatomical areas while reducing background noise. Instead of directly predicting joint coordinates, EfficientPoseSegNet outputs spatial heatmaps representing the confidence of anatomical keypoints. These keypoints are then extracted via soft-argmax and used to segment the human body into 17 anatomical regions. To further improve robustness and generalization under weak supervision, we incorporate a task-aware, confidence-weighted adaptation of Stochastic Weight Averaging (SWA), which stabilizes training and enhances multi-task performance. Additionally, we integrate an anomaly detection module that leverages anatomical segmentation features to identify concealed objects in body regions, addressing the ultimate operational goal of airport security screening. Tested on the Transportation Security Administration Passenger Screening Dataset, the model achieves a test loss of 3.9335, mean absolute error of 1.4330 pixels, keypoint accuracy of 99.79% within a 10-pixel threshold, pose estimation accuracy of 99%, segmentation Intersection over Union (IoU) of 97%, and an average anomaly detection AUC of 0.94 across body regions. This approach contributes significantly to Transportation Science and Logistics by enabling scalable, privacy-compliant, and real-time human pose estimation, body part segmentation, and concealed object detection in fast-paced airport screening settings.

  • New
  • Research Article
  • 10.1177/14759217251393117
Fault diagnosis of transmission lines via adaptive modal filtering and an enhanced convolutional neural network synergistic approach
  • Dec 4, 2025
  • Structural Health Monitoring
  • Long Zhao + 3 more

To address the challenges of high concealability and difficulty in identifying minor damage in transmission lines, as well as low fault diagnosis accuracy under strong noise conditions, a novel fault diagnosis method for transmission lines is proposed. A data processing method based on adaptive modal filtering is proposed by combining a variational constraint model with an adaptive frequency band extraction strategy. Subsequently, by leveraging the concept of the generalized Fourier transform, pseudopeak effects near modal frequencies are suppressed, achieving thorough noise signal filtering without altering the intrinsic state characteristics of the transmission lines. For fault diagnosis, a convolutional neural network enhanced with an attention module is constructed, and a fault diagnosis model integrated with bidirectional long short-term memory (BiLSTM) is proposed. By embedding a convolutional block attention module, network weights are dynamically adjusted to enhance feature representation in both channel and spatial dimensions. Additionally, the introduction of BiLSTM strengthens the model’s ability to process time series data. Finally, the proposed method is validated on a conductor vibration test platform, demonstrating its high diagnostic accuracy and superior performance in noisy environments compared with other models.

  • New
  • Research Article
  • 10.1088/1361-6501/ae2353
A convolutional neural network with attention module for subpixel localization and diameter estimation in dense particle detection
  • Dec 4, 2025
  • Measurement Science and Technology
  • Xiang Cai + 3 more

Abstract Image-based two-dimensional particle identification technology serves as the foundation for threedimensional reconstruction and particle tracking velocimetry (3D-PTV), playing a crucial role in flow field studies using tracer particles. The integration of subpixel-accuracy particle coordinate identification with particle diameter detection not only facilitates the matching process in 3D reconstruction but also significantly expands its application scope. This approach proves particularly valuable in scenarios involving wide particle size distributions or diameter-dependent phenomena, such as droplet breakup processes. However, existing methods exhibit limited performance in high-particle-density scenarios. In the current study, a U-Net network enhanced by the Convolutional Block Attention Module has been proposed, which can simultaneously output particle coordinates and diameters in a single-stage framework. A geometry-constrained postprocessing procedure tailored to the model's characteristics is introduced, substantially reducing artifact rates and improving identification quality. The model is trained on a synthetic dataset constructed using Gaussian particle models, incorporating variations in particle size, brightness, and noise. It achieves 70% recall in high-overlap scenarios (0.10 particles per pixel with 3-5px diameters), exhibiting strong adaptability to high-density overlapping particles and wide size distributions, along with robust noise resistance. The propose algorithm is further applied on the experimental images obtained from two real-world scenarios: single droplet impact on rotating blades and tracer-particle-laden liquid flows. Comparative tests on both synthetic datasets and experimental images demonstrate that the proposed method significantly outperforms the traditional Gaussian-fitting-based TrackPy approach.

  • New
  • Research Article
  • 10.1088/2057-1976/ae2337
Towards interpretable and edge-intelligent masseter monitoring: a self-powered framework for on-device and continuous assessment
  • Dec 4, 2025
  • Biomedical Physics & Engineering Express
  • Boyu Li + 2 more

Continuous and interpretable monitoring of masseter muscle activity is essential for the assessment of sleep bruxism (SB) and temporomandibular dysfunction (TMD). However, existing surface electromyography (sEMG) systems remain constrained by wired power supply, data-privacy concerns, and limited real-time specificity. To address these gaps, this study introduces a self-powered, edge-intelligent monitoring framework that combines poly(vinylidene fluoride) (PVDF)-based piezoelectric patches (BP-Patch) with a dual-branch lightweight neural network, the Depthwise Separable Convolutional Network with Efficient Channel Attention (DSC-AttNet). The network leverages depthwise separable convolution (DSC) to balance computational load and feature resolution, and incorporates an Efficient Channel Attention (ECA) module to enhance the discriminability between lateralised activations. After 8-bit quantisation, DSC-AttNet is deployed on an Arm Cortex-M4 microcontroller (MCU) while occupying only 80.7 KiB Flash and 72.8 KiB RAM, enabling real-time on-device inference across five physiological states (left/right bruxism, left/right chewing, and resting) with 94.75% classification accuracy and 63.6 ms average latency on data from 12 subjects. To support trustworthy AI-driven decision-making, Gradient-weighted Class Activation Mapping (Grad-CAM) and attention-based relevance analysis are employed to identify class-specific activation patterns across both time and frequency domains. These interpretable features further enable the derivation of clinically relevant indices such as nightly bruxism count, episode duration, and the Masseter Symmetry Index (MSI). By integrating bilateral self-powered sensing, resource-efficient edge inference, and quantitative interpretability within a fully on-device framework, this work lays the groundwork for long-term, home-based assessment and privacy-preserving intervention in masseter monitoring.

  • New
  • Research Article
  • 10.1117/1.jmi.12.6.064004
Multiscale attention network with structure guidance for colorectal polyp segmentation.
  • Dec 4, 2025
  • Journal of medical imaging (Bellingham, Wash.)
  • Yang Yang + 4 more

Accurate segmentation and precise delineation of colorectal polyp structures are crucial for early clinical diagnosis and treatment planning. However, existing polyp segmentation techniques face significant challenges due to the high variability in polyp size and morphology, as well as the frequent indistinctness of polyp-tissue structures. To address these challenges, we propose a multiscale attention network with structure guidance (MAN-SG). The core of MAN-SG is a structure extraction module (SEM) designed to capture rich structural information from fine-grained early-stage encoder features. In addition, we introduce a cross-scale structure guided attention (CSGA) module that effectively fuses multiscale features under the guidance of the structural information provided by the SEM, thereby enabling more accurate delineation of polyp structures. MAN-SG is implemented and evaluated using two high-performance backbone networks: Res2Net-50 and PVTv2-B2. Extensive experiments were conducted on five benchmark datasets for polyp segmentation. The results demonstrate that MAN-SG consistently outperforms existing state-of-the-art methods across these datasets. The proposed MAN-SG framework, which leverages structural guidance via SEM and CSGA modules, proves to be both highly effective and robust for the challenging task of colorectal polyp segmentation.

  • New
  • Research Article
  • 10.1088/2631-8695/ae281f
Lightweight pedestrian detection utilizing multi-modal image fusion with dynamic multi-weight adjustment
  • Dec 4, 2025
  • Engineering Research Express
  • Yemei Sun + 3 more

Abstract To tackle the challenges associated with low pedestrian detection accuracy and the substantial model parameters in low-light conditions, this paper presents a lightweight multi-modal object detection network based on the YOLOv8 framework. The goal is to fully leverage the complementary characteristics of infrared and visible light images to enhance detection accuracy and robustness. Initially, dual feature extraction branches are established for infrared and visible light images, with modal weights determined adaptively based on the characteristics of the input images. This adaptive mechanism effectively leverages the complementary information from the distinct modalities, thereby enhancing detection accuracy. By designing a local illumination perception module to detect local illumination variations in input images, the fusion weights of the multimodal features are dynamically adjusted. This adjustment subsequently enhances detection performance under complex lighting conditions. Specially, to further enhance channel attention without introducing extra parameters, a parameter-free channel attention module has been developed. This module employs global max and average pooling to evaluate channel importance, optimizing fused channel features and enhancing recognition accuracy and robustness in complex scenes. Experiments on the LLVIP dataset demonstrate that our method achieves 98.1% mAP50 and 68.4% mAP at 238 FPS, requiring six to thirteen times fewer parameters than recent state-of-the-art methods while maintaining competitive accuracy, making it highly suitable for resource-constrained real-time applications.

  • New
  • Research Article
  • 10.1038/s41598-025-26962-6
Multi-branch low-light image iterative enhancement network.
  • Dec 3, 2025
  • Scientific reports
  • Yiwen Dou + 4 more

Images captured at night or under low-light conditions often suffer from insufficient brightness, low resolution, and detail loss. Although numerous deep learning-based methods have been proposed, most rely on direct mappings from low-illumination to normal-illumination images, which struggle to adapt to diverse real-world conditions. To address these challenges, this paper proposes a Multi-Branch Low-Light Image Iterative Enhancement Network (MBLLIE-Net). Specifically, to enhance feature extraction at different levels, our framework adopts a multi-branch architecture, in which features of various depths and scales extracted by the encoder are processed and refined through multiple parallel branches. To overcome the limitation of insufficient spatial dependency modeling, we introduce a Spatial Recurrent Unit (SRU) within each branch, which effectively captures long-range spatial relationships while preserving local details. Furthermore, to better emphasize salient channels across varying feature dimensions, we propose an Adaptive Receptive Field Channel Attention (ARFCA) module that dynamically adjusts its receptive field according to the channel dimension, enabling precise feature selection with negligible computational overhead. Finally, the decoder fuses the outputs from all branches to generate an initial enhanced result, which is iteratively refined by concatenating it with the original input, ensuring progressive improvement in image quality. Extensive experiments demonstrate that MBLLIE-Net effectively restores illumination, detail, and color fidelity across a wide range of low-light scenarios, outperforming existing single-path approaches in both quantitative metrics and human perceptual evaluations.

  • New
  • Research Article
  • 10.36922/jse025400081
U-STDRNet: A unified model integrating swin transformer and residual dense network for seismic image super-resolution and denoising
  • Dec 3, 2025
  • Journal of Seismic Exploration
  • Mingliao Wu + 5 more

Enhancing seismic image resolution while effectively suppressing noise remains a critical challenge in accurately characterizing subsurface geological structures for oil and gas exploration. Traditional methods often fail to balance the recovery of fine details with robustness to noise, particularly in complex geological settings or under high-noise conditions. This study proposes a deep learning-based joint model, U-Net Shifted Window (Swin) Transformer-based dense residual network (U-STDRNet). The model integrates the global modeling capability of the Swin Transformer, the hierarchical feature reuse mechanism of the residual dense network, and an attention-guided strategy to jointly perform seismic image super-resolution and denoising. Built upon the U-Net encoder-decoder architecture, the model embeds Swin Transformer-based convolutional residual blocks. These blocks employ both a feature fusion block with the Swin Transformer and a feature fusion block with a convolutional neural network to effectively capture stratigraphic continuity and enhance detailed features such as fault edges. Residual dense blocks further improve weak signal recovery (e.g., thin-layer interfaces) through dense residual connections. Furthermore, the convolutional block attention module is integrated into skip connections, employing a dual-channel spatial weighting mechanism to suppress noise and emphasize key geological regions. Experimental results and field-data experiments demonstrate that U-STDRNet achieves a higher peak signal-to-noise ratio than the traditional U-Net. In addition, the model successfully restores fault and fold continuity details while exhibiting superior noise suppression compared to existing methods.

  • New
  • Research Article
  • 10.1109/tpami.2025.3632829
Grid Convolution for 3D Human Pose Estimation.
  • Dec 3, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Yangyuxuan Kang + 6 more

3D human pose estimation from 2D keypoint observation has been used in many human-centered computer vision applications. In this work, we tackle the task by formulating a novel grid representation learning paradigm that relies on grid convolution (GridConv), mimicking the wisdom of regular convolution operations in image space. GridConv is defined based on Semantic Grid Transformation (SGT) which leverages a binary assignment matrix to map standard skeleton 2D pose onto a regular weave-like grid pose joint by joint. We provide two ways to implement SGT: handcrafted and learnable SGT. Surprisingly, both designs turn out to achieve promising results and the learnable one is better, demonstrating the great potential of this new lifting representation learning formulation. To improve the ability of GridConv to encode contextual cues, we introduce an attention module over the convolutional kernel, making grid convolution operations input-dependent, spatial-aware and grid-specific. Besides our spatial grid lifting network for single-frame input, we also present a spatial-temporal grid lifting network for video-based input, which relies on an efficient multi-scale grid learning strategy to encode spatial and temporal joint variations. Extensive experiments demonstrate that the proposed grid lifting network outperforms existing approaches by remarkable margins on Human3.6M and MPI-INF-3DHP datasets. Our grid lifting networks also exhibit good generalization ability across three other keypoint-based tasks: 3D hand pose estimation, head pose estimation, and action recognition.

  • New
  • Research Article
  • 10.1371/journal.pone.0335745
Deep spatial attention networks for vision-based pavement distress perception in autonomous driving
  • Dec 3, 2025
  • PLOS One
  • Fuwen Deng + 1 more

Ensuring the safety and comfort of autonomous driving relies heavily on accurately perceiving the quality of the road pavement surface. However, current research has primarily focused on perceiving traffic participants such as surrounding vehicles and pedestrians, with relatively limited investigation into road surface quality perception. This paper addresses this gap by proposing a high-performance semantic segmentation method that utilizes real-time road images captured by an onboard camera to monitor the category and position of road defects ahead of the ego vehicle. Our approach introduces a novel multi-scale spatial attention module to enhance the accuracy of detecting road surface damage within the traditional semantic segmentation framework. To evaluate the proposed approach, we curated and utilized a dataset comprising 2,400 annotated images for model training and validating. Experimental results demonstrate that our method achieves a superior balance between detection precision and computational efficiency, outperforming existing semantic segmentation models in terms of mean IoU while maintaining low computational cost and high inference speed. This approach holds great potential for application in vision-based autonomous driving as it can be seamlessly integrated with appropriate control strategies, thereby offering passengers a smooth and reliable driving experience.

  • New
  • Research Article
  • 10.1109/tpami.2025.3639595
Alliance: All-in-One Spectral-Spatial-Frequency Awareness Foundation Model.
  • Dec 3, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Boyu Zhao + 7 more

Frequency domain analysis reveals fundamental image patterns difficult to observe in raw pixel values, while avoiding redundant information in original image processing. Although recent remote sensing foundation models (FMs) have made progress in leveraging spatial and spectral information, they have limitations in fully utilizing frequency characteristics that capture hidden features. Existing FMs that incorporate frequency properties often struggle to maintain connections with the original image content, creating a semantic gap that affects downstream performance. To address these challenges, we propose the All-in-One Spectral-Spatial-Frequency Awareness Foundation Model (Alliance), a framework that effectively integrates information across all three domains. Alliance introduces several key innovations: (1) a progressive frequency decoding mechanism inspired by human visual cognition that minimizes multi-domain information gaps while preserving connections between general image information and frequency characteristics, progressively reconstructing from low to mid to high frequencies to extract patterns difficult to observe in raw pixel values; (2) a triple-domain fusion attention module that separately processes amplitude, phase, and spectral-spatial relationships for comprehensive feature integration; and (3) frequency embedding with frequency-aware Cls token initialization and frequency-specific mask token initialization that achieves fine-grained modeling of different frequency band information. Additionally, to evaluate FMs generalizability, we construct the Yellow River dataset, a large-scale multi-temporal collection that introduces challenging cross-domain tasks and establishes more rigorous standards for FMs assessment. Extensive experiments across six downstream tasks demonstrate Alliance's superior performance.

  • New
  • Research Article
  • 10.1002/rob.70117
Advanced Deep Learning Architecture for Real‐Time Online Train Tread Segmentation and Wear Detection
  • Dec 2, 2025
  • Journal of Field Robotics
  • Qichang Guo + 2 more

ABSTRACT Train tread wear constitutes a critical factor compromising railway operational safety. Current deep learning–based image processing approaches predominantly analyze data acquired from area‐scan cameras and fail to simultaneously address detection accuracy and processing speed, thus hindering practical deployment in industrial settings. Furthermore, research on train tread centerline extraction remains virtually unexplored, despite its potential utility as key auxiliary information for rail condition monitoring. To address these limitations, we propose a real‐time train tread wear detection system. First, the tread region segmentation is implemented using a Light‐DeepLabV3+ architecture, enhanced through backbone network replacement and attention mechanism integration. Additionally, we introduce a lightweight ASPP‐L module that reduces model complexity while preserving segmentation accuracy. Second, a camera calibration–based method extracts the tread centerline. Finally, we present an Improved‐YOLOv8l model incorporating large‐scale feature maps and attention modules, which leverages the extracted centerline information for rapid and precise wear detection. Experimental validation demonstrates that our system achieves 45 FPS with 93.1% mAP on an NVIDIA RTX 3060 GPU, exceeding real‐time railway inspection requirements while maintaining state‐of‐the‐art detection performance.

  • New
  • Research Article
  • 10.1038/s41598-025-29315-5
Multiscale scene parsing network.
  • Dec 2, 2025
  • Scientific reports
  • Yuanyuan Wang + 6 more

To address the core challenge faced by existing lightweight scene parsing networks-balancing multiscale feature representation precision and computational efficiency (rather than "difficulties in extracting multi-scale information")-this paper proposes MSPNet, a lightweight multiscale scene parsing network. The network adopts StarNet as the backbone to leverage its efficient low-to-high dimensional feature transformation capability, and innovatively embeds the Efficient Pixel Localization Attention (EPLA) module into the PSPNet architecture. Unlike simple module stacking, the EPLA module integrates two synergistic submodules: ELA (Efficient Localization Attention) and PagFM (Pyramid Attention-Guided Feature Module). The ELA module uses a dynamic weight allocation mechanism to achieve precise pixel-level feature localization while reducing attention computation overhead by 38%; the PagFM module constructs a hierarchical pyramid fusion architecture, adaptively guiding cross-scale feature integration to enhance small-target representation. Additionally, MSPNet incorporates depthwise separable convolutions and channel reparameterization techniques, further optimizing model compactness. Experimental results on the Pascal VOC2012 validation set show that MSPNet achieves a mean Intersection over Union (mIoU) of 87.19%, a 1.79% improvement over PSPNet. With GFLOPs (9.7G for StarNet-s4 backbone) and parameter counts (7.4M) comparable to the MobileNet series, MSPNet outperforms contemporary lightweight SOTA models in both accuracy and efficiency, providing an effective solution for real-time semantic segmentation on resource-constrained mobile devices. The code for MSPNet is available at https://github.com/Eric-863/MSPnet.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers