Global feature enhancement and skip-connected fusion for grasping detection
Global feature enhancement and skip-connected fusion for grasping detection
- Research Article
4
- 10.1016/j.ecoinf.2025.103029
- May 1, 2025
- Ecological Informatics
Multiscale feature fusion and enhancement in a transformer for the fine-grained visual classification of tree species
- Research Article
5
- 10.1007/s11063-024-11547-7
- Mar 6, 2024
- Neural Processing Letters
Reinforcement of motor features is necessary in action recognition tasks. In this work, we propose an efficient feature reinforcement model, termed as Feature Selection and Enhancement Networks (FEASE-Net). The core of our FEASE-Net is the use of the FEASE module to adaptively capture input features at multi-scales and reinforce them globally. FEASE module is composed of two sub-module, Feature Selection (FS) and Feature Enhancement (FE). The FS focuses on adaptive attention and selection of input features through a multi-scale structure with an attention mechanism, and FE employs channel attention to enhance the global useful feature information. To assess the effectiveness of FEASE-Net, we undertake a series of extensive experiments on two benchmark datasets, namely Kinetics 400 and Something-Something V2. Our proposed FEASE-Net can achieve a competitive performance compared with previous state-of-the-art methods that use similar backbones.
- Research Article
17
- 10.1038/s41598-025-89096-9
- Feb 11, 2025
- Scientific Reports
Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential for further performance improvement due to their inherent limitations in capturing feature correlations of input data. To address this issue, this paper proposes a novel encoder-decoder architecture called MedFuseNet that aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. More specifically, the proposed approach contains two branches for feature learning in parallel: one leverages CNNs to learn local correlations of input data, and the other utilizes Swin-Transformer to capture global contextual correlations of input data. For feature fusion and enhancement, the designed hybrid attention mechanisms combine four different attention modules: (1) an atrous spatial pyramid pooling (ASPP) module for the CNN branch, (2) a cross attention module in the encoder for fusing local and global features, (3) an adaptive cross attention (ACA) module in skip connections for further performing fusion, and (4) a squeeze-and-excitation attention (SE-attention) module in the decoder for highlighting informative features. We evaluate our proposed approach on the public ACDC and Synapse datasets, and achieves the average DSC of 89.73% and 78.40%, respectively. Experimental results on these two datasets demonstrate the effectiveness of our proposed approach on medical image segmentation tasks, outperforming other used state-of-the-art approaches.
- Conference Article
8
- 10.1109/icmew53276.2021.9455955
- Jul 5, 2021
As a common low-level visual task, image dehazing is widely used in various multimedia fields. However, it faces great challenges due to its ill-posed property, especially the problem with dense haze removal. In this paper, we propose an efficient end-to-end global feature fusion attention network for single image dehazing based on an encoder-decoder structure. We design a Multi-scale Global Context Fusion (MGCF) block to capture the global context feature and integrates them into the network to assist in the recovery of the dense haze area. After the global feature is aggregated in the context modeling module, a two-stream structure is used to combine two feature transformation modes and two feature fusion modes to obtain more comprehensive and accurate features. We simplify the pixel attention block and combine it with the MGCF block to form a Global Feature Fusion Attention (GFFA) module. Based on the local residual learning and the GFFA module, the Feature Enhancement (FE) module is proposed as the basic module of the encoder-decoder, allowing the network to focus on enhancing more useful feature. Experimental results show that the proposed method surpasses state-of-the-art dehazing methods, both quantitatively and qualitatively.
- Research Article
21
- 10.1016/j.compag.2024.108635
- Jan 16, 2024
- Computers and Electronics in Agriculture
A Bagging-SVM field-road trajectory classification model based on feature enhancement
- Research Article
7
- 10.1016/j.engappai.2023.107269
- Oct 11, 2023
- Engineering Applications of Artificial Intelligence
Visual Tracking based on deformable Transformer and spatiotemporal information
- Research Article
51
- 10.1016/j.aei.2023.101898
- Jan 1, 2023
- Advanced Engineering Informatics
Multi-task spatio-temporal augmented net for industry equipment remaining useful life prediction
- Research Article
- 10.3724/sp.j.1089.2023-00319
- Apr 1, 2025
- Journal of Computer-Aided Design & Computer Graphics
Deep super-resolution reconstruction networks have a large number of parameters and slow inference speed. Lightweight networks unable to express deep features of images under complex environmental conditions. Aiming at those issues, a lightweight reversible super-resolution network based on feature enhancement is proposed. Firstly, edge feature residuals are proposed, combined with the proposed edge similarity loss to guide model reconstruction and enhance the expression of texture contours. Then, add a new wavelet feature kernel to support the reconstruction task with any scaling factors. Finally, a global feature extraction module is introduced to embed a self attention mechanism in the feature map to extract global features. The proposed network showed better performance than SwinIR-light on the benchmark test Set5 with a scaling factor of 4. It improved the PSNR by 0.41 dB, decreased the number of parameters by 244k, and reduced the inference time by 49.05%.
- Research Article
6
- 10.3390/ani14071106
- Apr 4, 2024
- Animals
The Amur tiger is an important endangered species in the world, and its re-identification (re-ID) plays an important role in regional biodiversity assessment and wildlife resource statistics. This paper focuses on the task of Amur tiger re-ID based on visible light images from screenshots of surveillance videos or camera traps, aiming to solve the problem of low accuracy caused by camera perspective, noisy background noise, changes in motion posture, and deformation of Amur tiger body patterns during the re-ID process. To overcome this challenge, we propose a serial multi-scale feature fusion and enhancement re-ID network of Amur tiger for this task, in which global and local branches are constructed. Specifically, we design a global inverted pyramid multi-scale feature fusion method in the global branch to effectively fuse multi-scale global features and achieve high-level, fine-grained, and deep semantic feature preservation. We also design a local dual-domain attention feature enhancement method in the local branch, further enhancing local feature extraction and fusion by dividing local feature blocks. Based on the above model structure, we evaluated the effectiveness and feasibility of the model on the public dataset of the Amur Tiger Re-identification in the Wild (ATRW), and achieved good results on mAP, Rank-1, and Rank-5, demonstrating a certain competitiveness. In addition, since our proposed model does not require the introduction of additional expensive annotation information and does not incorporate other pre-training modules, it has important advantages such as strong transferability and simple training.
- Research Article
12
- 10.1007/s11063-024-11680-3
- Aug 13, 2024
- Neural Processing Letters
Deep learning-based image classification networks heavily rely on the extracted features. However, as the model becomes deeper, important features may be lost, resulting in decreased accuracy. To tackle this issue, this paper proposes an image classification method that enhances low-level features and incorporates an attention mechanism. The proposed method employs EfficientNet as the backbone network for feature extraction. Firstly, the Feature Enhancement Module quantifies and statistically processes low-level features from shallow layers, thereby enhancing the feature information. Secondly, the Convolutional Block Attention Module enhances the high-level features to improve the extraction of global features. Finally, the enhanced low-level features and global features are fused to supplement low-resolution global features with high-resolution details, further improving the model’s image classification ability. Experimental results illustrate that the proposed method achieves a Top-1 classification accuracy of 86.49% and a Top-5 classification accuracy of 96.90% on the ETH-Food101 dataset, 86.99% and 97.24% on the VireoFood-172 dataset, and 70.99% and 92.73% on the UEC-256 dataset. These results demonstrate that the proposed method outperforms existing methods in terms of classification performance.
- Conference Article
- 10.1109/icarce55724.2022.10046452
- Dec 16, 2022
Image-text matching has received increasing attention because it enables the interaction between vision and language. Existing approaches have two limitations. First, most existing methods only pay attention to learning paired samples, ignoring the similar semantic information in the same modality. Second, the current methods lack interaction between local and global features, resulting in the mismatch of certain image regions or words due to the lack of global information. To solve the above problems, we propose a new dual semantic graph similarity learning (DSGSL) network, which consists of a feature enhancement module for learning compact features and a feature alignment module that learns the relations between global and local features. In the feature enhancement module, similar samples are processed as a graph, and a graph convolutional network is used to extract similar features to reconstruct the global feature representation. In addition, we use a gated fusion network to obtain discriminative sample representations by selecting salient features from other modalities and filtering out insignificant information. In the feature alignment module, we construct a dual semantic graph for every sample to learn the association between local features and global features. Numerous experiments on MS-COCO and Flicr30K have shown that our approach reaches the most advanced performance.
- Research Article
- 10.1109/tai.2025.3601594
- Jan 1, 2025
- IEEE Transactions on Artificial Intelligence
Vehicle re-identification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multi-information feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, (1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. (2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. (3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using non-visual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.
- Book Chapter
- 10.1007/978-3-030-88004-0_31
- Jan 1, 2021
Object detection is one of the most fundamental tasks toward image content understanding due to their wide applications in real-world. Although numerous algorithms have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the challenges in restricted situations of multi-size objects and weak semantic information. In this paper, we propose a feature information-interaction visual attention model for multi-layer feature fusion and enhancement, which utilizes channel information to weight self-attentive feature maps, completing extraction, fusion and enhancement of global semantic feature with local contextual information of the object. Additionally, we also propose an adaptively cyclic feature information-interaction model, which adopts branch prediction to decide the number of visual attention, accomplishing adaptive fusion of global semantic feature and local fine-grained information. Numerous experiments on the benchmark dataset PASCAL VOC and MS COCO show that our method effectively achieves significant improvements over baseline model.
- Research Article
30
- 10.1109/tcsvt.2022.3144186
- Aug 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Existing fine-grained image recognition methods are difficult to learn complete discriminative features from low-resolution (LR) data, because the original subtle inter-class distinctions become slimmer with the reduction of the image resolution. Besides, existing methods of LR fine-grained image recognition and general LR image recognition only consider the restoration and extraction of global discriminative features, ignoring unreliable local fine-grained details can be detrimental to final recognition. To address the above problems, we propose a multi-tasking framework, discriminative feature mining and enhancement network (DME-Net), for the LR fine-grained image recognition task, which aims to capture the reliable object descriptions from macro and micro perspectives, respectively. Macroscopically, we train the framework’s ability to recover and extract global discriminative features based on the whole images. Microscopically, we purposefully reinforce the framework’s ability to repair and capture the local discriminative details on the mined informative parts. To precisely excavate the most potential parts, we design an informative part mining (IPM) module, in which we firstly employ a part generation layer to predict several part masks that focus on different discriminative parts under the guidance of discrepancy loss and discriminant loss. Then we introduce a part selection (PS) submodule to further screen out a group of most informative parts from the predicted part masks according to their corresponding scores, which measure the semantic correlation degree of each part to the others. Experimental results on three benchmark datasets and one retail product dataset consistently show that our proposed framework can significantly boost the performance of the baseline model. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each component of our designs.
- Research Article
- 10.3390/f16081288
- Aug 7, 2025
- Forests
The shrubland ecosystems in arid areas are highly sensitive to global climate change and human activities. Accurate extraction of shrubs using computer vision techniques plays an essential role in monitoring ecological balance and desertification. However, shrub extraction from high-resolution GF-2 satellite images remains challenging due to their dense distribution and small size, along with complex background. Therefore, this study introduces a Feature Enhancement and Transformer Network (FETNet) by integrating the Feature Enhancement Module (FEM) and Transformer module (EdgeViT). Correspondently, they can strengthen both global and local features and enable accurate segmentation of small shrubs in complex backgrounds. The ablation experiments demonstrated that incorporation of FEM and EdgeViT can improve the overall segmentation accuracy, with 1.19% improvement of the Mean Intersection Over Union (MIOU). Comparison experiments show that FETNet outperforms the two leading models of FCN8s and SegNet, with the MIOU improvements of 7.2% and 0.96%, respectively. The spatial details of the extracted results indicated that FETNet is able to accurately extract dense, small shrubs while effectively suppressing interference from roads and building shadows in spatial details. The proposed FETNet enables precise shrub extraction in arid areas and can support ecological assessment and land management.