Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Global feature enhancement and skip-connected fusion for grasping detection

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Global feature enhancement and skip-connected fusion for grasping detection

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.ecoinf.2025.103029
Multiscale feature fusion and enhancement in a transformer for the fine-grained visual classification of tree species
  • May 1, 2025
  • Ecological Informatics
  • Yanqi Dong + 4 more

Multiscale feature fusion and enhancement in a transformer for the fine-grained visual classification of tree species

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1007/s11063-024-11547-7
FEASE: Feature Selection and Enhancement Networks for Action Recognition
  • Mar 6, 2024
  • Neural Processing Letters
  • Lu Zhou + 2 more

Reinforcement of motor features is necessary in action recognition tasks. In this work, we propose an efficient feature reinforcement model, termed as Feature Selection and Enhancement Networks (FEASE-Net). The core of our FEASE-Net is the use of the FEASE module to adaptively capture input features at multi-scales and reinforce them globally. FEASE module is composed of two sub-module, Feature Selection (FS) and Feature Enhancement (FE). The FS focuses on adaptive attention and selection of input features through a multi-scale structure with an attention mechanism, and FE employs channel attention to enhance the global useful feature information. To assess the effectiveness of FEASE-Net, we undertake a series of extensive experiments on two benchmark datasets, namely Kinetics 400 and Something-Something V2. Our proposed FEASE-Net can achieve a competitive performance compared with previous state-of-the-art methods that use similar backbones.

  • Research Article
  • Cite Count Icon 17
  • 10.1038/s41598-025-89096-9
MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
  • Feb 11, 2025
  • Scientific Reports
  • Ruiyuan Chen + 10 more

Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential for further performance improvement due to their inherent limitations in capturing feature correlations of input data. To address this issue, this paper proposes a novel encoder-decoder architecture called MedFuseNet that aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. More specifically, the proposed approach contains two branches for feature learning in parallel: one leverages CNNs to learn local correlations of input data, and the other utilizes Swin-Transformer to capture global contextual correlations of input data. For feature fusion and enhancement, the designed hybrid attention mechanisms combine four different attention modules: (1) an atrous spatial pyramid pooling (ASPP) module for the CNN branch, (2) a cross attention module in the encoder for fusing local and global features, (3) an adaptive cross attention (ACA) module in skip connections for further performing fusion, and (4) a squeeze-and-excitation attention (SE-attention) module in the decoder for highlighting informative features. We evaluate our proposed approach on the public ACDC and Synapse datasets, and achieves the average DSC of 89.73% and 78.40%, respectively. Experimental results on these two datasets demonstrate the effectiveness of our proposed approach on medical image segmentation tasks, outperforming other used state-of-the-art approaches.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icmew53276.2021.9455955
Global Feature Fusion Attention Network For Single Image Dehazing
  • Jul 5, 2021
  • Jie Luo + 3 more

As a common low-level visual task, image dehazing is widely used in various multimedia fields. However, it faces great challenges due to its ill-posed property, especially the problem with dense haze removal. In this paper, we propose an efficient end-to-end global feature fusion attention network for single image dehazing based on an encoder-decoder structure. We design a Multi-scale Global Context Fusion (MGCF) block to capture the global context feature and integrates them into the network to assist in the recovery of the dense haze area. After the global feature is aggregated in the context modeling module, a two-stream structure is used to combine two feature transformation modes and two feature fusion modes to obtain more comprehensive and accurate features. We simplify the pixel attention block and combine it with the MGCF block to form a Global Feature Fusion Attention (GFFA) module. Based on the local residual learning and the GFFA module, the Feature Enhancement (FE) module is proposed as the basic module of the encoder-decoder, allowing the network to focus on enhancing more useful feature. Experimental results show that the proposed method surpasses state-of-the-art dehazing methods, both quantitatively and qualitatively.

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.compag.2024.108635
A Bagging-SVM field-road trajectory classification model based on feature enhancement
  • Jan 16, 2024
  • Computers and Electronics in Agriculture
  • Weixin Zhai + 6 more

A Bagging-SVM field-road trajectory classification model based on feature enhancement

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.engappai.2023.107269
Visual Tracking based on deformable Transformer and spatiotemporal information
  • Oct 11, 2023
  • Engineering Applications of Artificial Intelligence
  • Ruixu Wu + 4 more

Visual Tracking based on deformable Transformer and spatiotemporal information

  • Research Article
  • Cite Count Icon 51
  • 10.1016/j.aei.2023.101898
Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction
  • Jan 1, 2023
  • Advanced Engineering Informatics
  • Haodong Li + 6 more

Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction

  • Research Article
  • 10.3724/sp.j.1089.2023-00319
A Lightweight Reversible Super-Resolution Network Based on Feature Enhancement
  • Apr 1, 2025
  • Journal of Computer-Aided Design & Computer Graphics
  • Benchen Yang + 2 more

Deep super-resolution reconstruction networks have a large number of parameters and slow inference speed. Lightweight networks unable to express deep features of images under complex environmental conditions. Aiming at those issues, a lightweight reversible super-resolution network based on feature enhancement is proposed. Firstly, edge feature residuals are proposed, combined with the proposed edge similarity loss to guide model reconstruction and enhance the expression of texture contours. Then, add a new wavelet feature kernel to support the reconstruction task with any scaling factors. Finally, a global feature extraction module is introduced to embed a self attention mechanism in the feature map to extract global features. The proposed network showed better performance than SwinIR-light on the benchmark test Set5 with a scaling factor of 4. It improved the PSNR by 0.41 dB, decreased the number of parameters by 244k, and reduced the inference time by 49.05%.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3390/ani14071106
A Serial Multi-Scale Feature Fusion and Enhancement Network for Amur Tiger Re-Identification.
  • Apr 4, 2024
  • Animals
  • Nuo Xu + 9 more

The Amur tiger is an important endangered species in the world, and its re-identification (re-ID) plays an important role in regional biodiversity assessment and wildlife resource statistics. This paper focuses on the task of Amur tiger re-ID based on visible light images from screenshots of surveillance videos or camera traps, aiming to solve the problem of low accuracy caused by camera perspective, noisy background noise, changes in motion posture, and deformation of Amur tiger body patterns during the re-ID process. To overcome this challenge, we propose a serial multi-scale feature fusion and enhancement re-ID network of Amur tiger for this task, in which global and local branches are constructed. Specifically, we design a global inverted pyramid multi-scale feature fusion method in the global branch to effectively fuse multi-scale global features and achieve high-level, fine-grained, and deep semantic feature preservation. We also design a local dual-domain attention feature enhancement method in the local branch, further enhancing local feature extraction and fusion by dividing local feature blocks. Based on the above model structure, we evaluated the effectiveness and feasibility of the model on the public dataset of the Amur Tiger Re-identification in the Wild (ATRW), and achieved good results on mAP, Rank-1, and Rank-5, demonstrating a certain competitiveness. In addition, since our proposed model does not require the introduction of additional expensive annotation information and does not incorporate other pre-training modules, it has important advantages such as strong transferability and simple training.

  • Research Article
  • Cite Count Icon 12
  • 10.1007/s11063-024-11680-3
Image Classification Based on Low-Level Feature Enhancement and Attention Mechanism
  • Aug 13, 2024
  • Neural Processing Letters
  • Yong Zhang + 3 more

Deep learning-based image classification networks heavily rely on the extracted features. However, as the model becomes deeper, important features may be lost, resulting in decreased accuracy. To tackle this issue, this paper proposes an image classification method that enhances low-level features and incorporates an attention mechanism. The proposed method employs EfficientNet as the backbone network for feature extraction. Firstly, the Feature Enhancement Module quantifies and statistically processes low-level features from shallow layers, thereby enhancing the feature information. Secondly, the Convolutional Block Attention Module enhances the high-level features to improve the extraction of global features. Finally, the enhanced low-level features and global features are fused to supplement low-resolution global features with high-resolution details, further improving the model’s image classification ability. Experimental results illustrate that the proposed method achieves a Top-1 classification accuracy of 86.49% and a Top-5 classification accuracy of 96.90% on the ETH-Food101 dataset, 86.99% and 97.24% on the VireoFood-172 dataset, and 70.99% and 92.73% on the UEC-256 dataset. These results demonstrate that the proposed method outperforms existing methods in terms of classification performance.

  • Conference Article
  • 10.1109/icarce55724.2022.10046452
Dual-semantic Graph Similarity Learning for Image-text Matching
  • Dec 16, 2022
  • Wenxin Tan + 3 more

Image-text matching has received increasing attention because it enables the interaction between vision and language. Existing approaches have two limitations. First, most existing methods only pay attention to learning paired samples, ignoring the similar semantic information in the same modality. Second, the current methods lack interaction between local and global features, resulting in the mismatch of certain image regions or words due to the lack of global information. To solve the above problems, we propose a new dual semantic graph similarity learning (DSGSL) network, which consists of a feature enhancement module for learning compact features and a feature alignment module that learns the relations between global and local features. In the feature enhancement module, similar samples are processed as a graph, and a graph convolutional network is used to extract similar features to reconstruct the global feature representation. In addition, we use a gated fusion network to obtain discriminative sample representations by selecting salient features from other modalities and filtering out insignificant information. In the feature alignment module, we construct a dual semantic graph for every sample to learn the association between local features and global features. Numerous experiments on MS-COCO and Flicr30K have shown that our approach reaches the most advanced performance.

  • Research Article
  • 10.1109/tai.2025.3601594
MFENet: Multi-information Feature Enhancement Network for Vehicle Re-identification
  • Jan 1, 2025
  • IEEE Transactions on Artificial Intelligence
  • Zhangwei Li + 5 more

Vehicle re-identification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multi-information feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, (1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. (2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. (3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using non-visual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.

  • Book Chapter
  • 10.1007/978-3-030-88004-0_31
ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object Detection
  • Jan 1, 2021
  • Chen Song + 3 more

Object detection is one of the most fundamental tasks toward image content understanding due to their wide applications in real-world. Although numerous algorithms have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the challenges in restricted situations of multi-size objects and weak semantic information. In this paper, we propose a feature information-interaction visual attention model for multi-layer feature fusion and enhancement, which utilizes channel information to weight self-attentive feature maps, completing extraction, fusion and enhancement of global semantic feature with local contextual information of the object. Additionally, we also propose an adaptively cyclic feature information-interaction model, which adopts branch prediction to decide the number of visual attention, accomplishing adaptive fusion of global semantic feature and local fine-grained information. Numerous experiments on the benchmark dataset PASCAL VOC and MS COCO show that our method effectively achieves significant improvements over baseline model.

  • Research Article
  • Cite Count Icon 30
  • 10.1109/tcsvt.2022.3144186
Discriminative Feature Mining and Enhancement Network for Low-Resolution Fine-Grained Image Recognition
  • Aug 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Tiantian Yan + 4 more

Existing fine-grained image recognition methods are difficult to learn complete discriminative features from low-resolution (LR) data, because the original subtle inter-class distinctions become slimmer with the reduction of the image resolution. Besides, existing methods of LR fine-grained image recognition and general LR image recognition only consider the restoration and extraction of global discriminative features, ignoring unreliable local fine-grained details can be detrimental to final recognition. To address the above problems, we propose a multi-tasking framework, discriminative feature mining and enhancement network (DME-Net), for the LR fine-grained image recognition task, which aims to capture the reliable object descriptions from macro and micro perspectives, respectively. Macroscopically, we train the framework’s ability to recover and extract global discriminative features based on the whole images. Microscopically, we purposefully reinforce the framework’s ability to repair and capture the local discriminative details on the mined informative parts. To precisely excavate the most potential parts, we design an informative part mining (IPM) module, in which we firstly employ a part generation layer to predict several part masks that focus on different discriminative parts under the guidance of discrepancy loss and discriminant loss. Then we introduce a part selection (PS) submodule to further screen out a group of most informative parts from the predicted part masks according to their corresponding scores, which measure the semantic correlation degree of each part to the others. Experimental results on three benchmark datasets and one retail product dataset consistently show that our proposed framework can significantly boost the performance of the baseline model. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each component of our designs.

  • Research Article
  • 10.3390/f16081288
Shrub Extraction in Arid Regions Based on Feature Enhancement and Transformer Network from High-Resolution Remote Sensing Images
  • Aug 7, 2025
  • Forests
  • Hao Liu + 11 more

The shrubland ecosystems in arid areas are highly sensitive to global climate change and human activities. Accurate extraction of shrubs using computer vision techniques plays an essential role in monitoring ecological balance and desertification. However, shrub extraction from high-resolution GF-2 satellite images remains challenging due to their dense distribution and small size, along with complex background. Therefore, this study introduces a Feature Enhancement and Transformer Network (FETNet) by integrating the Feature Enhancement Module (FEM) and Transformer module (EdgeViT). Correspondently, they can strengthen both global and local features and enable accurate segmentation of small shrubs in complex backgrounds. The ablation experiments demonstrated that incorporation of FEM and EdgeViT can improve the overall segmentation accuracy, with 1.19% improvement of the Mean Intersection Over Union (MIOU). Comparison experiments show that FETNet outperforms the two leading models of FCN8s and SegNet, with the MIOU improvements of 7.2% and 0.96%, respectively. The spatial details of the extracted results indicated that FETNet is able to accurately extract dense, small shrubs while effectively suppressing interference from roads and building shadows in spatial details. The proposed FETNet enables precise shrub extraction in arid areas and can support ecological assessment and land management.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant