Articles published on Perceptual Attention
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
214 Search results
Sort by Recency
- Research Article
- 10.3758/s13421-026-01854-w
- Mar 18, 2026
- Memory & cognition
- Gordon D Logan + 4 more
Eight experiments tested the common idea that memory retrieval is attention turned inward by adapting a perceptual attention task to short-term and long-term memory paradigms to measure the focus of attention on memory. The resulting position-cued recognition task produces a distance effect that defines the sharpness of the focus of attention on memory. In theory, the effect depends on the similarity of memory probes to the cued item and its neighbors on the list and not the memory store that holds the list. The experiments asked whether the focused-attention distance effect would be observed in both long-term and short-term memory paradigms. Experiments 1-3 defined long-term memory operationally as "that which survives distracting tasks" and found distance effects after two, four, or six distracting arithmetic problems that were similar to distance effects with no distraction. Experiments 4-6 defined long-term memory operationally as "that which was trained" and found distance effects after 0, 10, and 20 pre-training trials that were similar to distance effects with novel items. Experiments 7 and 8 defined long-term memory operationally as "that which contains pre-experimental knowledge," and found distance effects when subjects recognized probe letters in spoken words and in the names of pictures they identified. The results support the hypothesis that retrieval from short-term and long-term memory both require attention turned inward. They support the hypothesis that attentional selection, turned inward or outward, depends more on the similarity structure of the list and the probe than the memory store that holds the list.
- Research Article
- 10.1088/1361-6501/ae4642
- Mar 3, 2026
- Measurement Science and Technology
- Ping Ding + 1 more
Abstract Accurately predicting the remaining useful life of bearings is of great importance for enabling intelligent operation and health management of industrial equipment. Existing methods often face challenges in jointly modeling degradation trends and transient anomalies. Furthermore, their integration of physical priors remains underexplored, limiting the reliability of predictions. To address these challenges, this paper proposes a novel bearing life prediction framework, termed the frequency-domain perception attention based dual-path network with physics-informed loss (FDPN). The model employs a dual-encoder architecture to separately extract low-frequency degradation-trend features and high-frequency transient-anomaly features, which are adaptively fused via a gated fusion module. Furthermore, a wavelet-based frequency-domain perceptual attention mechanism is designed to explicitly model multi-scale frequency characteristics in vibration signals. Moreover, a physics-informed loss function is constructed by incorporating trend-consistency constraints and an asymmetric error-penalty mechanism, ensuring that the predictions conform to degradation patterns while maintaining safety and reliability. Experimental results on the PHM2012 bearing dataset demonstrate that the proposed FDPN model outperforms mainstream methods, achieving an 8.43% reduction in root mean square error and a 10.70% improvement in safety score (SCORE) compared to the best baseline. These improvements validate FDPN’s superior performance in prediction accuracy, reliability, and engineering safety.
- Research Article
- 10.1142/s0218126626501501
- Feb 6, 2026
- Journal of Circuits, Systems and Computers
- Gongwen Li + 5 more
The widespread deployment of automated guided vehicles (AGVs) in dynamic complex environments poses a critical challenge in achieving efficient avoidance and energy optimization. This paper proposes an observation-enhanced reinforcement learning with reward constraints algorithm (ORLRCA) to enhance AGV safety and energy efficiency obstacle avoiance in complex scenarios. In our work, firstly, a multimodal perceptual attention mechanism is introduced to dynamically capture obstacle motion patterns and environmental semantic features, thereby enhancing scene perception capabilities. Secondly, a multi-scale prior reward-constrained function is designed to jointly optimize safety distance, path smoothness, and energy consumption metrics, effectively addressing the suboptimal convergence caused by conflicting objectives in RL strategies. Finally, leveraging the actor-critic network architecture of proximal policy optimization (PPO), we achieve end-to-end optimization of robust obstacle avoidance policy generation through synergistic integration of attention-enhanced state representations and multi-scale reward signals. To validate efficacy, a high-fidelity simulation environment is developed for comparative experiments. Results demonstrate that the proposed algorithm exhibits superior performance in obstacle avoidance success rate and energy efficiency compared to baseline RL methods, establishing a theoretical foundation for secure deployment and energy-efficient management of AGVs in practical industrial scenarios.
- Research Article
- 10.1109/lra.2026.3669796
- Jan 1, 2026
- IEEE Robotics and Automation Letters
- Yebei Wen + 5 more
Visual-inertial odometry (VIO) serves as a dominant framework for real-time motion state estimation in micro aerial vehicles (MAVs). However, existing VIO techniques remain highly susceptible to high dynamic range (HDR) illumination conditions. Bio-inspired by the lateralized visual system of the Strawberry Squid, we propose a SSS-VIO technique that integrates a functionally asymmetric-eye design and perceptual attention mechanisms to overcome HDR-induced limitations. The proposed method is applicable both to the design of asymmetric stereo platforms and to cost-effective retrofitting of commercial stereo cameras. Experimental evaluations on an OAK-4P-New camera demonstrate a 40% increase in usable dynamic range with a single neutral density filter. Furthermore, SSS-VIO also outperforms multiple state-of-the-art VIO methods in MAV flight tests, reducing Absolute Trajectory Error by at least 46.1% and consistently improving Relative Pose Error across different trajectory segments. To the best of our knowledge, this is the first bio-inspired asymmetric-eye solution for enhancing the VIO dynamic range, which might unlock new potentials for MAV indoor-outdoor localization and applications.
- Research Article
- 10.3390/electronics14224527
- Nov 19, 2025
- Electronics
- Zikang Zhang + 1 more
Infrared small target detection (IRSTD) is hindered by low signal-to-noise ratios, minute object scales, and strong target–background similarity. Although long-range skip fusion is exploited in SCTransNet, the global context is insufficiently captured by its convolutional encoder, and the fusion block remains vulnerable to structured clutter. To address these issues, a Mamba-enhanced framework, MixMambaNet, is proposed with three mutually reinforcing components. First, ResBlocks are replaced by a perception-aware hybrid encoder, in which local perceptual attention is coupled with mixed pixel–channel attention along multi-branch paths to emphasize weak target cues while modeling image-wide context. Second, at the bottleneck, dense pre-enhancement is integrated with a selective-scan 2D (SS2D) state-space (Mamba) core and a lightweight hybrid-attention tail, enabling linear-complexity long-range reasoning that is better suited to faint signals than quadratic self-attention. Third, the baseline fusion is substituted with a non-local Mamba aggregation module, where DASI-inspired multi-scale integration, SS2D-driven scanning, and adaptive non-local enhancement are employed to align cross-scale semantics and suppress structured noise. The resulting U-shaped network with deep supervision achieves higher accuracy and fewer false alarms at a competitive cost. Extensive evaluations on NUDT-SIRST, NUAA-SIRST, and IRSTD-1k demonstrate consistent improvements over prevailing IRSTD approaches, including SCTransNet.
- Research Article
2
- 10.1088/1361-6501/ae08d7
- Oct 8, 2025
- Measurement Science and Technology
- Shuai Hao + 5 more
Abstract To address the problem of low accuracy in transmission line fault detection caused by multi-scale targets faults in complex backgrounds, a novel approach named DM-YOLO is proposed. Firstly, to address the challenge of effectively extracting features from multi-scale targets faults, a dynamic multi-scale convolution module was designed and introduced into the original YOLOv8 network, enhancing the model’s ability to express features at different scales. Secondly, a multi-dimensional perceptual attention module was proposed and embedded into the feature extraction network, thus improving the detection accuracy by obtain the correlation and global information between different regions of the feature image. Thirdly, to address the problems of missing and false detection caused by the insufficient efficiency of fusing features at different levels, a multi-head feature fusion module was designed and introduced into the feature fusion network, which enhances the detection network’s comprehension of both semantic and textural information. Finally, to evaluate the algorithm’s performance, a dataset containing twelve types of fault samples was established, and comparative experiments were performed with other classic detection algorithms. The experimental results indicate that the enhanced model achieves an average accuracy of 93.8%, surpassing that of the original model. Furthermore, the proposed model demonstrates a high detection accuracy for multi-scale target faults within complex backgrounds.
- Research Article
- 10.3390/electronics14193934
- Oct 3, 2025
- Electronics
- Mingchen Dai + 1 more
Impurities in polypropylene random copolymer (PPR) raw materials can seriously affect the performance of the final product, and efficient and accurate impurity detection is crucial to ensure high production quality. In order to solve the problems of high small-target miss rates, weak anti-interference ability, and difficulty in balancing accuracy and speed in existing detection methods used in complex industrial scenarios, this paper proposes an enhanced machine vision detection algorithm based on YOLOv11. Firstly, the FasterLDConv module dynamically adjusts the position of sampling points through linear deformable convolution (LDConv), which improves the feature extraction ability of small-scale targets on complex backgrounds while maintaining lightweight features. The IR-EMA attention mechanism is a novel approach that combines an efficient reverse residual architecture with multi-scale attention. This combination enables the model to jointly capture feature channel dependencies and spatial relationships, thereby enhancing its sensitivity to weak impurity features. Again, a DC-DyHead deformable dynamic detection head is constructed, and deformable convolutions are embedded into the spatial perceptual attention of DyHead to enhance its feature modelling ability for anomalies and occluded impurities. We introduce an enhanced InnerMPDIoU loss function to optimise the bounding box regression strategy. This new method addresses issues related to traditional CIoU losses, including excessive penalties imposed on small targets and a lack of sufficient gradient guidance in situations where there is almost no overlap. The results indicate that the average precision (mAP@0.5) of the improved algorithm on the self-made PPR impurity dataset reached 88.6%, which is 2.3% higher than that of the original YOLOv11n, while precision (P) and recall (R) increased by 2.4% and 2.8%, respectively. This study provides a reliable technical solution for the quality inspection of PPR raw materials and serves as a reference for algorithm optimisation in the field of industrial small-target detection.
- Research Article
1
- 10.3390/brainsci15101053
- Sep 27, 2025
- Brain Sciences
- Annie Tremblay + 1 more
Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ perception of English lexical stress. Methods: We tested 39 bilinguals with varying dominance profiles and 40 monolingual English speakers in a stress identification task using auditory stimuli that independently manipulated vowel quality, pitch, and duration. Results: Bayesian logistic regression models revealed that, compared to monolinguals, bilinguals relied less on vowel quality and more on pitch and duration, mirroring cue distributions in Spanish versus English. Critically, cue weighting within the bilingual group varied systematically with language dominance: English-dominant bilinguals patterned more like monolingual English listeners, showing increased reliance on vowel quality and decreased reliance on pitch and duration, whereas Spanish-dominant bilinguals retained a cue weighting that was more Spanish-like. Conclusions: These results support experience-based models of speech perception and provide behavioral evidence that bilinguals’ perceptual attention to acoustic cues remains flexible and dynamically responsive to long-term input. These results are in line with a neurobiological account of speech perception in which attentional and representational mechanisms adapt to changes in the input.
- Research Article
1
- 10.3390/electronics14193773
- Sep 24, 2025
- Electronics
- Abdul Rehman + 4 more
This study introduces a machine learning–driven extended reality (XR) interaction framework that leverages electroencephalography (EEG) for decoding consumer intentions in immersive decision-making tasks, demonstrated through functional food purchasing within a simulated autonomous vehicle setting. Recognizing inherent limitations in traditional “Preference vs. Non-Preference” EEG paradigms for immersive product evaluation, we propose a novel and robust “Rest vs. Intention” classification approach that significantly enhances cognitive signal contrast and improves interpretability. Eight healthy adults participated in immersive XR product evaluations within a simulated autonomous driving environment using the Microsoft HoloLens 2 headset (Microsoft Corp., Redmond, WA, USA). Participants assessed 3D-rendered multivitamin supplements systematically varied in intrinsic (ingredient, origin) and extrinsic (color, formulation) attributes. Event-related potentials (ERPs) were extracted from 64-channel EEG recordings, specifically targeting five neurocognitive components: N1 (perceptual attention), P2 (stimulus salience), N2 (conflict monitoring), P3 (decision evaluation), and LPP (motivational relevance). Four ensemble classifiers (Extra Trees, LightGBM, Random Forest, XGBoost) were trained to discriminate cognitive states under both paradigms. The ‘Rest vs. Intention’ approach achieved high cross-validated classification accuracy (up to 97.3% in this sample), and area under the curve (AUC > 0.97) SHAP-based interpretability identified dominant contributions from the N1, P2, and N2 components, aligning with neurophysiological processes of attentional allocation and cognitive control. These findings provide preliminary evidence of the viability of ERP-based intention decoding within a simulated autonomous-vehicle setting. Our framework serves as an exploratory proof-of-concept foundation for future development of real-time, BCI-enabled in-transit commerce systems, while underscoring the need for larger-scale validation in authentic AV environments and raising important considerations for ethics and privacy in neuromarketing applications.
- Research Article
16
- 10.1109/tpami.2025.3568433
- Sep 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Hao Zhang + 4 more
Existing image fusion methods struggle to accommodate composite degradation and do not support users flexibly modulating the semantic objects of interest. To address these challenges, this study proposes a composite degradation-robust image fusion framework with language-driven semantics, called OmniFuse. Firstly, OmniFuse establishes a novel multi-modal information fusion paradigm based on the latent diffusion model (LDM). By projecting the information fusion function into the latent space of the LDM, the information fusion process is seamlessly integrated with the diffusion process. Thus, OmniFuse fully leverages the powerful generative capabilities of LDM to eliminate composite degradation, thereby achieving highly robust image fusion. Secondly, OmniFuse develops a language-driven controllable fusion strategy to strengthen fusion flexibility. It employs a language-driven feature fusion module (LFFM) to receive the specified localization priori, dynamically aggregating multi-modal features. Within LFFM, a visual enhancement regularization is introduced to highlight objects of interest for capturing perceptual attention, while reverse semantic driving is established to strengthen their semantic attributes. Together, the visual and semantic constraints can implicitly correct the imperfect localization priori, further refining the accuracy of language-driven control. Extensive experiments demonstrate the omnipotent performance of OmniFuse, with significant advantages in robustness and flexibility compared to state-of-the-art methods.
- Research Article
1
- 10.3390/jmse13081528
- Aug 9, 2025
- Journal of Marine Science and Engineering
- Shibo Song + 1 more
Precise underwater object detectors can provide Autonomous Underwater Vehicles (AUVs) with good situational awareness in underwater environments, supporting a wide range of unmanned exploration missions. However, the quality of optical imaging is often insufficient to support high detector accuracy due to poor lighting and the complexity of underwater environments. Therefore, this paper develops an efficient and precise object detector that maintains high recognition accuracy on degraded underwater images. We design a Cross Spatial Global Perceptual Attention (CSGPA) mechanism to achieve accurate recognition of target and background information. We then construct an Efficient Multi-Scale Weighting Feature Pyramid Network (EMWFPN) to eliminate computational redundancy and increase the model’s feature-representation ability. The proposed Occlusion-Robust Wavelet Network (ORWNet) enables the model to handle fine-grained frequency-domain information, enhancing robustness to occluded objects. Finally, EMASlideloss is introduced to alleviate sample-distribution imbalance in underwater datasets. Our architecture achieves 81.8% and 83.8% mAP on the DUO and UW6C datasets, respectively, with only 7.2 GFLOPs, outperforming baseline models and balancing detection precision with computational efficiency.
- Research Article
- 10.1016/j.chbr.2025.100767
- Aug 1, 2025
- Computers in Human Behavior Reports
- Tongwen Hu + 2 more
From cognitive asymmetries to NLP optimization: Quantifying type-frequency interactions in emotional text processing through eye-tracking experiments
- Research Article
3
- 10.1038/s40494-025-01839-z
- Jun 17, 2025
- npj Heritage Science
- Shu Zhou + 4 more
Oracle Bone Script, as the earliest known form of Chinese writing, plays a significant role in archaeological and historical studies due to the importance of recognizing its imagery. However, existing deep learning technologies face challenges in automatically recognizing Oracle Bone Script, including the lack of fine control over local features, the neglect of texture information, and insufficient learning of highly discriminative features. To address these issues, this paper introduces a novel image processing model for Oracle Bone Script named OracleNet. OracleNet consists of an Adaptive Deformation Module, a Texture–Structure Decoupling Module, and a Multi-Level Structured Perceptual Attention Module. The Adaptive Deformation Module enhances local control through adaptive points, maintaining the semantic integrity of the script; the Texture–Structure Decoupling Module distinguishes between texture and structural elements, improving recognition accuracy; the Multi-Level Structured Perceptual Attention Module refines differences through macro and micro perspectives. OracleNet has been validated on multiple datasets, achieving state-of-the-art performance on the Oracle-241, OBC306 and Oracle-MNIST datasets, demonstrating the model’s superior accuracy and robustness.
- Research Article
3
- 10.1038/s41598-025-93158-3
- Mar 7, 2025
- Scientific Reports
- Weiyi Wei + 3 more
Cellular micronucleus detection plays an important role in pathological toxicology detection and early cancer diagnosis. To address the challenges of tiny targets, high inter-class similarity, limited sample data and class imbalance in the field of cellular micronucleus image detection, this paper proposes a lightweight network called MobileViT-MN (Micronucleus), which integrates a multilayer perceptual attention mechanism. Considering that limited data and class imbalance may lead to overfitting of the model, we employ data augmentation to mitigate this problem. Additionally, based on domain adaptation, we innovatively introduce transfer learning. Furthermore, a novel Deep Separation-Decentralization module is designed to implement the reconstruction of the network, which employs attention mechanisms and an alternative strategy of deep separable convolution. Numerous ablation experiments are performed to validate the effectiveness of our method. The experimental results show that MobileViT-MN obtains outstanding performance on the augmented cellular micronucleus dataset. Avg_Acc reaches 0.933, F1 scores 0.971, and ROC scores 0.965. Compared with other classical algorithms, MobileViT-MN is more superior in classification performance.
- Research Article
5
- 10.1016/j.infrared.2024.105631
- Mar 1, 2025
- Infrared Physics and Technology
- Chunbo Zhao + 4 more
CMIFDF: A lightweight cross-modal image fusion and weight-sharing object detection network framework
- Research Article
- 10.1121/10.0036128
- Mar 1, 2025
- The Journal of the Acoustical Society of America
- Susan Nittrouer
The distribution of perceptual attention across the myriad acoustic properties of speech undergoes developmental shifts through the first decade of life, changing from a focus on dynamic spectral structure to other kinds of temporal, amplitude, and static spectral properties. These developmental changes accompany a gradual enhancement in sensitivity to phonological structure. A central question concerning spoken language acquisition by children with hearing loss who use cochlear implants (CIs) concerns how they navigate these developmental changes and what effect signal degradation has on developing language abilities, especially sensitivity to phonological structure. To explore these questions, this report describes outcomes of data collected from adolescents with normal hearing and adolescents with CIs. Perceptual weighting factors were computed for static and dynamic spectral properties using a fricative-vowel labeling paradigm. Measures of speech recognition, language abilities, word reading, and phonological processing were also obtained. Results showed that the adolescents with CIs weighted dynamic spectral structure hardly at all. Weighting of static spectral structure was largely related to their abilities to manipulate and retain phonological structure in memory. Overall, these findings indicate that supporting developmental shifts in perceptual weighting strategies should remain a goal of intervention for children with hearing loss who use CIs.
- Research Article
- 10.1027/1864-1105/a000457
- Feb 19, 2025
- Journal of Media Psychology
- Ralf Schmälzle + 3 more
Abstract: This study introduces a novel VR-based approach to measure pupillary responses during media consumption. Researchers exposed participants to 30 video messages in a virtual TV viewing room, capturing pupil dilation and constriction via VR-integrated eye-tracking. By analyzing cross-receiver similarity (inter-subject correlations) of pupillometric responses, we could identify which specific video an individual was watching. This method worked best under normal viewing conditions and was sensitive to attentional manipulations. This study also found that messages with the most robust pupil response signatures were more likely to be remembered. Theoretical implications for quantifying media exposure and developing signatures of perceptual attention in individuals and audiences are discussed. Practically, this pupillary audience response measurement could be applied to various media formats, including screen-based media, social media, and VR/AR environments. In sum, the study highlights the potential of pupillometry in understanding audience engagement and response dynamics in naturalistic media consumption settings.
- Research Article
34
- 10.1088/1361-6501/adb2ad
- Feb 17, 2025
- Measurement Science and Technology
- Hongfeng Tao + 4 more
Abstract The objectives in traffic sign detection and recognition scenario are predominantly small, which frequently result in missed and erroneous detection due to their limited information content and complex environment. To address these problems, this paper proposes a new network architecture cross-dimensional and dual-domain feature fusion-you only look once (CDFF-YOLO) which is integrated of various modules. For the purpose of overcoming the difficulty in extracting information from small objects, the Multi-dimension Spatial information Fusion module in the network are used to extract feature sequences at different dimensions by superposition. So as to address the issue of the loss of detail information of small objects, embed the Multi-branch Perceptual Attention module into the C2f module to capture feature information and enhance the global-local feature information exchange. In order to solve the issue of uneven illumination and occlusion in the detection scene. The DFF module is employed to transform and fuse the extracted small object feature information from the Space-to-depth convolution at the frequency and spatial domains, thereby enhancing the network’s capability to reconstruct and fuse feature information in dual-domains. The experimental data on the TT100K dataset demonstrate that the enhanced algorithm exhibits an increase of 3.7% in mAP@50, 4.8% in mAP@50:95, and a 4.5% and 3.7% rise in the average precision and average recall for small objects, respectively. Additionally, the frames per second remains 157. The improved algorithm also performs well on the CCTSDB dataset. It is evident that the CDFF-YOLO algorithm has the capacity to markedly enhance the detection efficacy of traffic signs, while maintaining optimal detection speed.
- Research Article
4
- 10.3390/s25020528
- Jan 17, 2025
- Sensors (Basel, Switzerland)
- Song Hong + 1 more
Spectrum sensing is recognized as a viable strategy to alleviate the scarcity of spectrum resources and to optimize their usage. In this paper, considering the time-varying characteristics and the dependence on various timescales within a time series of samples composed of in-phase (I) and quadrature (Q) component signals, we propose a multi-scale time-correlated perceptual attention model named MSTC-PANet. The model consists of multiple parallel temporal correlation perceptual attention (TCPA) modules, enabling us to extract features at different timescales and identify dependencies among features across various timescales. Our simulations show that MSTC-PANet significantly improves the detection of channel occupancy at low signal-to-noise ratios (SNR), particularly in untrained scenarios with lower SNR conditions and modulation uncertainties. The analysis of the ROC curve indicates that at an SNR of -20 dB, the proposed MSTC-PANet achieves a detection rate of 98% with a false alarm rate of 10%. Furthermore, MSTC-PANet, which has been trained using digital modulation techniques, also demonstrates applicability to analog modulation.
- Research Article
- 10.1109/tgrs.2025.3617582
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
- Yingao Wang + 5 more
As deep learning technology continues to advance in the field of meteorological forecasting, accurate radar echo extrapolation technology is crucial for two-hour precipitation prediction. Recent approaches commonly utilize 2D CNN and 3D CNN for feature extraction. However, the absence of high-level temporal sequence features prevents the model from acquiring sufficient information for accurate precipitation forecasting. To this end, we introduce a MIPRNet for precise precipitation forecasting. The Multi-Information Extractor (MIE), based on graph convolution and Fourier transforms, captures high-level complex temporal features of extrapolation evolution and frequency characteristics of precipitation dynamics. Meanwhile, MHPA, which utilizes the multi-head attention mechanism, aggregates and captures potential precipitation evolution patterns. In the extrapolation process, MIPRNet uses the potential evolution patterns of extrapolation obtained from the information extraction part to perform extrapolation. Experimental results on two radar meteorological datasets demonstrate that MIPRNet outperforms existing models in terms of multiple authoritative metrics.