Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Invariant Image
  • Invariant Image

Articles published on Robust Image Features

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
80 Search results
Sort by
Recency
  • Research Article
  • 10.1109/tip.2026.3662579
Robust 2.5D Feature Matching in Light Fields via a Learnable Parameterized Depth-Degraded Projection.
  • Jan 1, 2026
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Meng Zhang + 4 more

Due to the loss of 3D information, accurate and robust 2D image feature matching remains challenging for many computer vision applications. This paper introduces a 2.5D feature that uses the disparity value from the light field Fourier disparity layer (FDL) as a rough proxy of scene depth. Without explicit depth estimation, a parameterized depth-degraded projection is proposed to construct the geometric transformation of paired features between two light fields. Then, we propose a parameterized learning solution to calculate the depth-degraded projection. This solution estimates a global constant fundamental matrix, a variable disparity-guided translation vector, and a depth compensation term using a very simple network. Although the 0.5D relative disparity provided by the FDL does not represent precise depth, it can also significantly reduce the depth ambiguity in feature matching. Therefore, the proposed solution achieves accurate feature-matching results by minimizing the sum of reprojection errors across all matching candidates. On the public light field feature-matching dataset, the proposed solution outperforms existing 2D image feature-matching solutions and light field feature-matching algorithms in terms of matching accuracy and robustness. The code is available online.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tmi.2025.3564474
Domain-Generalized Discrete Diffusion Model for Cross-Domain Medical Image Segmentation.
  • Nov 1, 2025
  • IEEE transactions on medical imaging
  • Heran Yang + 3 more

Domain shift is a significant challenge in medical image segmentation, primarily due to variations in image acquisition protocols, modalities, etc. Domain shift often causes models trained on a source domain to perform poorly on unseen target domains. In this work, we introduce the Domain-Generalized Discrete Diffusion Model for Segmentation (DG-DDM-Seg), a diffusion-based generative model designed for single-source domain generalization in medical image segmentation. DG-DDM-Seg generates discrete conditional distributions of segmentation masks. To ensure domain independence, we employ two key strategies: 1) We extract robust features from conditional images to enhance the domain independence of diffusion model. 2) We use both conditional images and pseudo-labels as inputs to improve cross-domain segmentation performance. Along this idea, we propose a two-path reverse diffusion process during training, utilizing Robust Feature Extraction Subnet and Mask-Generation Transformer to learn a domain-generalized discrete conditional distribution based on robust image features and pseudo-labels. This learned distribution is then used to generate segmentation masks for unseen target domains. Experimental results demonstrate that DG-DDM-Seg achieves state-of-the-art performance in cross-domain medical image segmentation, with domain shifts in modality, sequence, and site. The code is available at https://github.com/HeranYang/DG-DDM-Seg.

  • Research Article
  • 10.3390/s25216600
MemRoadNet: Human-like Memory Integration for Free Road Space Detection
  • Oct 27, 2025
  • Sensors (Basel, Switzerland)
  • Sidra Shafiq + 2 more

Detecting available road space is a fundamental task for autonomous driving vehicles, requiring robust image feature extraction methods that operate reliably across diverse sensor-captured scenarios. However, existing approaches process each input independently without leveraging Accumulated Experiential Knowledge (AEK), limiting their adaptability and reliability. In order to explore the impact of AEK, we introduce MemRoadNet, a Memory-Augmented (MA) semantic segmentation framework that integrates human-inspired cognitive architectures with deep-learning models for free road space detection. Our approach combines an InternImage-XL backbone with a UPerNet decoder and a Human-like Memory Bank system implementing episodic, semantic, and working memory subsystems. The memory system stores road experiences with emotional valences based on segmentation performance, enabling intelligent retrieval and integration of relevant historical patterns during training and inference. Experimental validation on the KITTI road, Cityscapes, and R2D benchmarks demonstrates that our single-modality RGB approach achieves competitive performance with complex multimodal systems while maintaining computational efficiency and achieving top performance among single-modality methods. The MA framework represents a significant advancement in sensor-based computer vision systems, bridging computational efficiency and segmentation quality for autonomous driving applications.

  • Research Article
  • 10.3389/fnins.2025.1658776
A multi-view multimodal deep learning framework for Alzheimer's disease diagnosis
  • Oct 1, 2025
  • Frontiers in Neuroscience
  • Jianxin Feng + 4 more

IntroductionEarly diagnosis of Alzheimer's disease (AD) remains challenging due to the high similarity among AD, mild cognitive impairment (MCI), and cognitively normal (CN) individuals, as well as confounding factors such as population heterogeneity, label noise, and variations in imaging acquisition. Although multimodal neuroimaging techniques like MRI and PET can provide complementary information, current approaches are limited in multimodal fusion and multi-scale feature aggregation.MethodsWe propose a novel multimodal diagnostic framework, Alzheimer's Disease Multi-View Multimodal Diagnostic Network (ADMV-Net), to enhance recognition accuracy across all AD stages. Specifically, a dual-pathway Hybrid Convolution ResNet module is designed to fuse global semantic and local boundary information, enabling robust three-dimensional medical image feature extraction. Furthermore, a Multi-view Fusion Learning mechanism, which comprises a Global Perception Module, a Multi-level Local Cross-modal Aggregation Network, and a Bidirectional Cross-Attention Module, is introduced to efficiently capture and integrate multimodal features from multiple perspectives. Additionally, a Regional Interest Perception Module is incorporated to highlight brain regions strongly associated with AD pathology.ResultsExtensive experiments on public datasets demonstrate that ADMV-Net achieves 94.83% accuracy and 95.97% AUC in AD versus CN classification, significantly outperforming mainstream methods. The framework also shows strong discriminative capability and excellent generalization performance in multi-class classification tasks.DiscussionThese findings suggest that ADMV-Net effectively leverages multimodal and multi-view information to improve the diagnostic accuracy of AD. By integrating global, local, and regional features, the framework provides a promising tool for assisting early diagnosis and clinical decision-making in Alzheimer's disease. The implementation code is publicly available at https://github.com/zhaoxinyu-1/ADMV-Net.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11760-025-04369-0
Finger vein recognition using an ensemble of KNN classifiers based on robust image features
  • Jun 26, 2025
  • Signal, Image and Video Processing
  • Huvaida Manzoor + 2 more

Finger vein recognition using an ensemble of KNN classifiers based on robust image features

  • Research Article
  • Cite Count Icon 19
  • 10.1002/qute.202500224
Dual Discriminators Quantum Generation Adversarial Network Based on Quantum Convolutional Neural Network
  • Jun 25, 2025
  • Advanced Quantum Technologies
  • Li‐Hua Gong + 3 more

Abstract As a crucial component in quantum machine learning, quantum generative adversarial networks play a significant role in generating discrete distributions. However, due to issues such as the vanishing gradient and mode collapse, the results generated by quantum generative adversarial models are of suboptimal quality sometimes. Given the robust image feature extraction capabilities of quantum convolutional neural networks, a hybrid quantum convolutional neural network model is incorporated into the quantum generative adversarial network as a discriminator. Different from the traditional multi‐layer linear structure, this discriminator adopts a parallel structure. This parallel structure can analyze both the local and the global features of an image simultaneously. It can promptly detect the defects in the global distribution of generated data, prompting the generator to explore more data patterns and avoid falling into mode collapse. The feasibility of this solution is verified through generation experiments on the handwritten dataset, Fashion‐MNIST dataset, and CIFAR‐100 dataset. The experimental results show that the FID (Fréchet Inception Distance) scores of the generated results on these three datasets reach 14, 20, and 17 respectively, fully demonstrating the performance of this image generation algorithm.

  • Research Article
  • 10.1364/ao.562412
Ps-ViT: phase space vision transformer pre-training for the depth estimation in computer-generated holograms.
  • Jun 24, 2025
  • Applied optics
  • Madali Nabil + 1 more

Recent advances in neural network pre-training have significantly improved state-of-the-art performance across various computer vision tasks, especially in scenarios with limited labeled data. These improvements stem from the ability to learn transferable and robust image feature descriptors from large-scale, unlabeled, and often noisy datasets through self-supervised training. Despite these successes, the field of holography has seen limited benefits from such approaches due to the challenges in developing effective pre-training strategies tailored to holographic data. In this work, we address this gap by introducing a pre-training method leveraging the hologram phase space representation. This approach enables the learning of efficient feature descriptors optimized for dense depth map estimation, unlocking new potential in holographic imaging applications.

  • Research Article
  • Cite Count Icon 10
  • 10.1007/s00198-025-07541-x
Machine learning is changing osteoporosis detection: an integrative review.
  • Jun 10, 2025
  • Osteoporosis international : a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA
  • Yuji Zhang + 8 more

Machine learning drives osteoporosis detection and screening with higher clinical accuracy and accessibility than traditional osteoporosis screening tools. This review takes a step-by-step view of machine learning for osteoporosis detection, providing insights into today's osteoporosis detection and the outlook for the future. The early diagnosis and risk detection of osteoporosis have always been crucial and challenging issues in the medical field. With the in-depth application of artificial intelligence technology, especially machine learning technology in the medical field, significant breakthroughs have been made in the application of early diagnosis and risk detection of osteoporosis. Machine learning is a multidimensional technical system that encompasses a wide variety of algorithm types. Machine learning algorithms have become relatively mature and developed over many years in medical data processing. They possess stable and accurate detection performance, laying a solid foundation for the detection and diagnosis of osteoporosis. As an essential part of the machine learning technical system, deep-learning algorithms are complex algorithm models based on artificial neural networks. Due to their robust image recognition and feature extraction capabilities, deep learning algorithms have become increasingly mature in the early diagnosis and risk assessment of osteoporosis in recent years, opening new ideas and approaches for the early and accurate diagnosis and risk detection of osteoporosis. This paper reviewed the latest research over the past decade, ranging from relatively basic and widely adopted machine learning algorithms combined with clinical data to more advanced deep learning techniques integrated with imaging data such as X-ray, CT, and MRI. By analyzing the application of algorithms at different stages, we found that these basic machine learning algorithms performed well when dealing with single structured data but encountered limitations when handling high-dimensional and unstructured imaging data. On the other hand, deep learning can significantly improve detection accuracy. It does this by automatically extracting image features, especially in image histological analysis. However, it faces challenges. These include the "black-box" problem, heavy reliance on large amounts of labeled data, and difficulties in clinical interpretability. These issues highlighted the importance of model interpretability in future machine learning research. Finally, we expect to develop a predictive model in the future that combines multimodal data (such as clinical indicators, blood biochemical indicators, imaging data, and genetic data) integrated with electronic health records and machine learning techniques. This model aims to present a skeletal health monitoring system that is highly accessible, personalized, convenient, and efficient, furthering the early detection and prevention of osteoporosis.

  • Research Article
  • Cite Count Icon 1
  • 10.52783/jisem.v10i47s.9258
Image Caption Generation Using Deep Learning
  • May 16, 2025
  • Journal of Information Systems Engineering and Management
  • D Prannav

Image caption generation, a primary application domain in computer vision and natural language processing, produces text captions of images from deep learning models. The current paper suggests a CNN-LSTM-based system for automatic captioning, where pre-trained convolutional neural networks (CNNs) are employed for image feature extraction and long short-term memory (LSTM) networks for sequential text generation. Inspired by the Flickr8k dataset, the paper emphasizes primary challenges such as vocabulary sparsity, overfitting, and computational complexity. Experimental results achieve BLEU scores of 0.66 or more, exhibiting coherent caption generation and qualitative analysis discloses captioning inefficiencies for complex scenes. The paper also discusses future enhancements such as transformer-based architectures and attention mechanisms to improve caption accuracy and accessibility. The work contributes to improving large-scale human-computer interaction through multimodal AI systems. Caption generation is an important area at the intersection of computer vision and natural language processing, including the generation of descriptive text captions describing images using advanced deep-learning methodologies. Current paper suggests a new approach through a hybrid CNN-LSTM-based system for automatic captioning. This state-of-the-art model employs pre-trained convolutional neural networks (CNNs) for robust image feature extraction to identify and interpret relevant features in an image. These identified features are then fed to long short-term memory (LSTM) networks adept at generating coherent and relevant sequential text based on the visual input. The experimental results revealed excellent BLEU scores of 0.66 or higher, which reflects the model's capacity to generate captions not only accurate but also linguistically sound. Qualitative analysis of the generated captions does call out inefficiencies in handling complicated scenes with more than one element or activity, and it suggests where there is potential for improvement in the future. In the future, the paper foresees potential enhancements, such as the application of transformer-based models and attention, which would significantly improve caption accuracy and user experience for accessibility. Overall, this work contributes to advancing the state of large-scale human-computer interaction by developing sophisticated multimodal AI systems for interpreting and generating human-like text from visual inputs.

  • Research Article
  • Cite Count Icon 11
  • 10.3389/fnbot.2024.1521603
MGFusion: a multimodal large language model-guided information perception for infrared and visible image fusion
  • Dec 23, 2024
  • Frontiers in Neurorobotics
  • Zengyi Yang + 3 more

Existing image fusion methods primarily focus on complex network structure designs while neglecting the limitations of simple fusion strategies in complex scenarios. To address this issue, this study proposes a new method for infrared and visible image fusion based on a multimodal large language model. The method proposed in this paper fully considers the high demand for semantic information in enhancing image quality as well as the fusion strategies in complex scenes. We supplement the features in the fusion network with information from the multimodal large language model and construct a new fusion strategy. To achieve this goal, we design CLIP-driven Information Injection (CII) approach and CLIP-guided Feature Fusion (CFF) strategy. CII utilizes CLIP to extract robust image features rich in semantic information, which serve to supplement the information of infrared and visible features, thereby enhancing their representation capabilities for the scene. CFF further utilizes the robust image features extracted by CLIP to select and fuse the infrared and visible features after the injection of semantic information, addressing the challenges of image fusion in complex scenes. Compared to existing methods, the main advantage of the proposed method lies in leveraging the powerful semantic understanding capabilities of the multimodal large language model to supplement information for infrared and visible features, thus avoiding the need for complex network structure designs. Experimental results on multiple public datasets validate the effectiveness and superiority of the proposed method.

  • Research Article
  • Cite Count Icon 6
  • 10.1088/1361-6501/ad7e3e
A multi-scale deep residual network-based guided wave imaging evaluation method for fatigue crack quantification
  • Oct 22, 2024
  • Measurement Science and Technology
  • Hutao Jing + 3 more

Abstract As a promising structural health monitoring technology, guided wave (GW) imaging is gaining increasing attention for crack monitoring of aircraft structures. However, actual fatigue crack propagation is a complex dynamically evolving process affected by various variabilities. It is still challenging to accurately track and quantify the dynamic fatigue crack propagation with GW imaging methods. Therefore, in order to achieve more accurate fatigue crack quantification, this paper proposes a multi-scale deep residual network-based GW imaging evaluation method. A convolutional neural network (CNN) is utilized to evaluate the entire pixel distribution of GW imaging maps to fuse damage-related information from multiple GW monitoring paths. By designing multi-scale convolutional kernels and deep residual learning, a robust quantitative image feature extraction is ensured with the dynamic evolution process of fatigue crack growth and the performance degradation is avoided as the CNN goes deeper, thereby improving the quantification accuracy. The method is validated on a fatigue test of landing gear beams, which are important load-carrying aircraft structural components. The results demonstrate that the proposed method can extract multi-scale crack length-related features and accurately track fatigue crack propagations. For batch specimens, the maximum quantification error is reduced from the original 6.1 mm to 1.6 mm, marking a significant improvement.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1038/s41598-024-73853-3
DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing
  • Sep 27, 2024
  • Scientific Reports
  • Gaoshuang Huang + 5 more

Using visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions, is generally unsatisfactory. Therefore, obtaining efficient and robust image feature descriptors in complex environments is a pressing issue. In this study, we utilized the DINOv2 model as the backbone for trimming and fine-tuning to extract robust image features and employed a feature mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrated that the proposed DINO-Mix outperforms the current state-of-the-art (SOTA) methods. Using test sets having lighting variations, seasonal changes, and occlusions such as Tokyo24/7, Nordland, and SF-XL-Testv1, our proposed architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively, and exhibited an average accuracy improvement of 5.14%. In addition, we compared it with other SOTA methods using representative image retrieval case studies, and our architecture outperformed its competitors in terms of VPR performance. Furthermore, we visualized the attention maps of DINO-Mix and other methods to provide a more intuitive understanding of their respective strengths. These visualizations serve as compelling evidence of the superiority of the DINO-Mix framework in this domain.

  • Research Article
  • 10.54254/2755-2721/46/20241314
Angle calculation method based on Cognex binary image processing and edge tool positioning
  • Mar 15, 2024
  • Applied and Computational Engineering
  • Hangkai Zhong

Aiming at the problem of no suitable measurement method for the angle between a product's upper and lower cylinder axes in a specific horizontal rotation position, a calculation method based on the Cognex vision system for automatic angle measurement was proposed. This algorithm uses binary image processing technology to reduce interference in product implementation caused by variations in surface roughness and resulting inconsistencies in the reflection effect. The ability to perform robust image feature searching is thereby built upon. It utilizes edge tools to locate the points on either side of a cylindrical product and compute the axial coordinate for averaging the measured axes at these positions to determine each product elements upper and lower axial angle measurement. Simulation results show that this algorithm utilizes binary image processing to effectively filter product differences, which plays a role in capturing product image features continuously and reliably. The edge tool based on feature location can accurately locate product edges and complete target angle calculations. It has certain production field application capabilities regarding image processing effectiveness and computational logic accuracy.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/jstars.2024.3422310
Vision-Based 3-D Localization of UAV Using Deep Image Matching
  • Jan 1, 2024
  • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Mansoor Khurshid + 4 more

Unmanned aerial vehicles (UAVs) have revolutionized various industries by providing efficient and automated flight capabilities. However, reliance on GPS and traditional navigation systems poses challenges in scenarios where signal interference or failures occur. In this research, we present a novel computer-vision-based method to enhance UAV navigation, enabling accurate height and location estimation. Our approach utilizes a sophisticated network that leverages a pair of images to estimate UAV height. The pyramid stereo-matching network is employed to extract robust image features and generate a disparity map. Subsequently, a custom network processes and convolves these data, employing diverse computer vision techniques to achieve precise height estimation. To evaluate the effectiveness of our proposed method, we collected a comprehensive dataset by conducting flights with a Phantom 4 Pro drone over the NUST Main campus, H-12 Islamabad. The dataset encompasses images captured at 10 different heights, spanning from 100 to 280 m, with flights evenly spaced 20 m apart. In rigorous evaluations, our approach demonstrates promising results compared to existing methods. By liberating UAVs from reliance on GPS, this vision-based 3-D localization technique holds immense potential to ensure successful flights even in challenging environments.

  • Research Article
  • Cite Count Icon 14
  • 10.1109/joe.2023.3310079
CEWformer: A Transformer-Based Collaborative Network for Simultaneous Underwater Image Enhancement and Watermarking
  • Jan 1, 2024
  • IEEE Journal of Oceanic Engineering
  • Jun Wu + 5 more

Since the copyright of the enhanced underwater image should be protected, we propose a transformer-based collaborative network (CEWformer) for simultaneous underwater image enhancement and watermarking. In CEWformer, a channel self-attention transformer (CSAT) is deployed by mining channel correlations to enhance channels with severe color attenuation. To emphasize quality degradation and inconspicuous regions, a mixed self-attention transformer (MSAT) is also employed by computing both channel and spatial correlations for improving the image quality. Meanwhile, CEWformer integrates a watermark fusion transformer (WFT) to capture robust image features by modeling the cross-domain relationship between the image and watermark for increasing watermarking robustness. In addition, multiscale image and watermark features are fused to gain multiple watermark copies for increasing robustness as well. Extensive experimental results demonstrate that the proposed CEWformer can enhance the underwater image and embed a robust watermark simultaneously and effectively. Compared to existing underwater image enhancement methods, the visual quality of the proposed CEWformer is better, which shows the low effect of watermark embedding on the image quality. Furthermore, the proposed CEWformer is superior to existing image watermarking models in terms of watermarking robustness and invisibility.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.isprsjprs.2023.12.006
TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes
  • Dec 19, 2023
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Shengjun Tang + 7 more

TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes

  • Research Article
  • 10.22214/ijraset.2023.57004
Automated Image Captioning with Deep Learning
  • Nov 30, 2023
  • International Journal for Research in Applied Science and Engineering Technology
  • Sameer Indora + 2 more

Abstract: In recent years, deep learning has transformed computer vision, giving rise to automated image captioning systems bridging the gap between visual content and natural language. This paper presents an innovative approach to automated image captioning, combining deep learning models and methodologies. Our system employsConvolutional Neural Networks (CNNs) for robust image feature extraction and Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) networks, for generating coherent captions. It is trained on diverse image-caption datasets, learning intricate associations between visual content and textual descriptions.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.optlastec.2023.110298
An unsupervised deep learning framework for large-scale lung CT deformable image registration
  • Nov 6, 2023
  • Optics & Laser Technology
  • Yuqian Zhao + 4 more

An unsupervised deep learning framework for large-scale lung CT deformable image registration

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • 10.1088/1742-6596/2650/1/012046
Automatic Matching of Optical and SAR Images Based on Attention Structure Features
  • Nov 1, 2023
  • Journal of Physics: Conference Series
  • Jiwei Deng + 5 more

Due to the disparity in imaging techniques, significant radiometric and geometric variances exist among optical and Synthetic Aperture Radar (SAR) images, making it a challenging task for achieving automatic and accurate matching in contemporary international academic research. Handcrafted structural features have shown some success in heterogeneous image matching in recent years. However, improving its matching performance manually proves to be difficult. As a result, this work presents a matching strategy based on attention-enhanced structural feature representation to improve optical and SAR images matching accuracy. In this research, a novel multi-branch global attention module is built using handmade structural feature extraction. This module can focus on the common information of structural feature descriptors in space and channel, extracting finer and more robust image features. Then, the proposed method utilizes the sum of squared difference (SSD) learning metric, which is based on the fast Fourier transform, to develop a loss function. This loss function is then used to train positive and negative samples in order to enhance the discriminative ability of the model. Experimental results obtained from training and testing on numerous optical and SAR datasets demonstrate that the proposed method significantly improves the accuracy of matching optical and SAR images compared to both current structural feature matching methods and advanced deep learning matching models.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tie.2022.3212422
A Robust Pixel-Wise Prediction Network With Applications to Industrial Robotic Grasping
  • Aug 1, 2023
  • IEEE Transactions on Industrial Electronics
  • Xuebing Liu + 7 more

Accurate object detection and 6D pose estimation are the key technologies in robotic grasping applications, where efficiency and robustness are the two most desirable goals. Especially for textureless industrial parts, it is difficult for most existing methods to extract robust image features from cluttered scenarios with heavy occlusion. To address this challenge, we propose a novel pixel-wise prediction strategy using local features to infer global information based on the inherent local-global relations of rigid objects. This strategy is robust to missing or disturbed local information since each pixel has an independent prediction, and the dense prediction manner can mitigate the instability caused by outliers. Accordingly, we first generate dense pixel-wise predictions of the object category, center, and keypoint from image features extracted by an encoder-decoder network. Then, these predictions are used to vote on and identify the keypoint locations of the specific instance object, and finally, the poses are estimated from the keypoints by an uncertainty <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">PnP</i> (Perspective-n-Point) algorithm. Experiments on various scenarios are implemented to illustrate the advantages of our approach on severe industrial scenes, and a robotic grasping platform is constructed to evaluate its application performance.

  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers