IDNet: A Single-Shot Object Detector Based on Feature Fusion
This paper proposes a novel single shot network for object detection. The proposed network, termed IDNet, explores the strategies of the feature fusion to alleviate the scale variation problem in object detection. IDNet mainly consists of two feature fusion modules: an indirect feature fusion module (IF) and a direct feature fusion module (DF). The IF shares long-range dependencies within pyramidal layers and based on these information, IDNet learns to emphasize informative regions and suppress the less useful ones on each layer. The DF is a feature fusion strategy based on modified lateral connection inspired by feature pyramid networks (FPN). It utilizes the averaging operation to reduce the change of feature maps' order of magnitude during fusing features to further improve the performance for detecting small instances. Comprehensive experiments are performed and the results indicate the effectiveness of IDNet, which reaches 80.3 mAP on PASCAL VOC 2007 benchmark.
- Research Article
13
- 10.1155/2021/6685954
- Jan 1, 2021
- Computational Intelligence and Neuroscience
In order to alleviate the scale variation problem in object detection, many feature pyramid networks are developed. In this paper, we rethink the issues existing in current methods and design a more effective module for feature fusion, called multiflow feature fusion module (MF3M). We first construct gate modules and multiple information flows in MF3M to avoid information redundancy and enhance the completeness and accuracy of information transfer between feature maps. Furtherore, in order to reduce the discrepancy of classification and regression in object detection, a modified deformable convolution which is termed task adaptive convolution (TaConv) is proposed in this study. Different offsets and masks are predicted to achieve the disentanglement of features for classification and regression in TaConv. By integrating the above two designs, we build a novel feature pyramid network with feature fusion and disentanglement (FFAD) which can mitigate the scale misalignment and task misalignment simultaneously. Experimental results show that FFAD can boost the performance in most models.
- Conference Article
1
- 10.1109/icma.2018.8484571
- Aug 1, 2018
Feature Pyramid Network (FPN) is one of the best object detection algorithms in the current object detection field, which uses convolutional neural network (CNN) to detect different scaled objects in an image. However, FPN's feature fusion method ignores the influence of the consecutive feature, which hinders the information flow. In this paper, we proposed an end-to-end image detection model called CFN (Consecutive Feature Network) to overcome this problem and speed up the detection process. Under the premise of equal accuracy, the novel feature fusion method we propose can detect faster than other methods. In the feature fusion module, features from consecutive layers with different scales are merged instead of compartmental layers, which will be fed to the classification and regression subnet to predict the final detection results. On the PASCAL VOC 2007 test, without any data augmentation training skills, our proposed network can achieve 77.1 mAP (mean average precision) at the speed of 3.9 FPS (frame per second) on a single Nvidia 1080Ti GPU. Code will be made publicly available.
- Research Article
14
- 10.3390/app12010107
- Dec 23, 2021
- Applied Sciences
In the coal mining process, various types of tramp materials will be mixed into the raw coal, which will affect the quality of the coal and endanger the normal operation of the equipment. Automatic detection of tramp materials objects is an important process and basis for efficient coal sorting. However, previous research has focused on the detection of gangue, ignoring the detection of other types of tramp materials, especially small targets. Because the initial Single Shot MultiBox Detector (SSD) lacks the efficient use of feature maps, it is difficult to obtain stable results when detecting tramp materials objects. In this article, an object detection algorithm based on feature fusion and dense convolutional network is proposed, which is called tramp materials in raw coal single-shot detector (TMRC-SSD), to detect five types of tramp materials such as gangue, bolt, stick, iron sheet, and iron chain. In this algorithm, a modified DenseNet is first designed and a four-stage feature extractor is used to down-sample the feature map stably. After that, we use the dilation convolution and multi-branch structure to enrich the receptive field. Finally, in the feature fusion module, we designed cross-layer feature fusion and attention fusion modules to realize the semantic interaction of feature maps. The experiments show that the module we designed is effective. This method is better than the existing model. When the input image is 300 × 300 pixels, it can reach 96.12% MAP and 24FPS. Especially in the detection of small objects, the detection accuracy has increased by 4.1 to 95.57%. The experimental results show that this method can be applied to the actual detection of tramp materials objects in raw coal.
- Research Article
54
- 10.1016/j.patcog.2023.110112
- Nov 13, 2023
- Pattern Recognition
Feature fusion method based on spiking neural convolutional network for edge detection
- Research Article
78
- 10.1109/tmm.2022.3143707
- Jan 1, 2023
- IEEE Transactions on Multimedia
Object detection methods based on Convolution Neural Networks (CNN) usually utilize feature pyramid networks to detect objects with various scales. The state-of-the-art feature pyramid networks improve detection accuracy by enhancing multi-level feature representations. Fusing multi-level features is the most effective manner to enhance the feature representations. However, the existing feature pyramid networks usually fuse multi-level features by element-wise operations. It leads to the lack of long-range dependencies in the feature fusion. To address the problem, we propose a simple yet efficient feature pyramid network named latent feature pyramid network (LFPN). LFPN can enhance the feature representations by modeling inner-scale and cross-scale long-range dependencies through conducting inner-scale and cross-scale feature fusion in the latent space. Comprehensive experiments are performed on two challenge object detection datasets: MS COCO and Pascal VOC. The experimental results show consistent improvements on various feature pyramid networks, backbones, and object detectors, which demonstrates the effectiveness and generality of our LFPN.
- Research Article
10
- 10.1016/j.compag.2023.108000
- Jun 22, 2023
- Computers and Electronics in Agriculture
An efficient multi-task convolutional neural network for dairy farm object detection and segmentation
- Research Article
- 10.1142/s021800142550020x
- Jul 31, 2025
- International Journal of Pattern Recognition and Artificial Intelligence
Object detection is widely used in many fields, and multi-scale feature extraction is crucial for accurate detection. The Feature Pyramid Network (FPN) is a commonly adopted feature extraction approach in object detection. Nevertheless, direct fusion between top-down feature layers in FPN leads to misalignment and loss of feature information. To address these limitations, this paper proposes a novel feature pyramid network based on FPN, termed SAR-FPN, which consists of two components: the Scale Adaptive Triple-Branch Module (SATM) and the Reverse Feature Fusion Module (RFFM). Specifically, SATM enhances the performance for large objects through its triple-branch design. It selects the appropriate branch based on object scale, assigns suitable receptive fields for objects of different sizes, and performs feature alignment. The RFFM addresses the issue of poor performance in small object detection by implementing a bottom-up feature fusion pathway. Extensive experiments on the PASCAL VOC 2012 and MS COCO 2017 datasets validated the effectiveness of SAR-FPN.
- Research Article
1
- 10.1088/1742-6596/2829/1/012016
- Sep 1, 2024
- Journal of Physics: Conference Series
Object detection, particularly for oriented small objects, benefits greatly from the generation of multi-scale feature maps by the feature fusion module. Despite the success of series of Feature Pyramid Network models in general object detection, their application in remote sensing object detection has received limited attention. Small oriented objects require contextual information for detection, while various types of objects need different long-range context. In this paper, we propose an intuitive and simple fusion module to be added in Feature Pyramid Network called Selective Spatial Feature Pyramid Network (SSFPN) to address these challenges. SSFPN dynamically adjusts the spatial receptive field across multi-scale feature maps, enhancing the modeling of contextual variations among different objects in remote sensing scenarios. In extensive experiments, SSFPN has achieved competitive results, i.e., an improvement of 0.34% after adding SSFPN to the Oriented RCNN model. The codes are available at https://github.com/Atlantisming/SSFPN.
- Research Article
49
- 10.3390/s22134933
- Jun 29, 2022
- Sensors (Basel, Switzerland)
COVID-19 is highly contagious, and proper wearing of a mask can hinder the spread of the virus. However, complex factors in natural scenes, including occlusion, dense, and small-scale targets, frequently lead to target misdetection and missed detection. To address these issues, this paper proposes a YOLOv5-based mask-wearing detection algorithm, YOLOv5-CBD. Firstly, the Coordinate Attention mechanism is introduced into the feature fusion process to stress critical features and decrease the impact of redundant features after feature fusion. Then, the original feature pyramid network module in the feature fusion module was replaced with a weighted bidirectional feature pyramid network to achieve efficient bidirectional cross-scale connectivity and weighted feature fusion. Finally, we combined Distance Intersection over Union with Non-Maximum Suppression to improve the missed detection of overlapping targets. Experiments show that the average detection accuracy of the YOLOv5-CBD model is 96.7%—an improvement of 2.1% compared to the baseline model (YOLOv5).
- Conference Article
4
- 10.1145/3373509.3373529
- Oct 23, 2019
Attention mechanism and feature pyramid have been widely used in various fields of deep learning in recent years. Especially, Feature Pyramid Network(FPN)becomes a popular object detection network since it is put forward in 2017, which is embedded into many well-known networks.However, FPN takes a suboptimal approach to fuse feature and detects small objects on low-level features that fused with high-level features which contain redundant information. There are very few articles discussing the way of feature fusion.So in this paper we propose a novel Attention-based Feature Pyramid Network(AFPN) which can not only enable better integration of high-level and low-level feature maps but also increase accurate semantic information of low-level features. In particular, the AFPN consists of two modules: the Feature Fusion Module(FFM) and the Feature Enhance Module(FEM). Because our model is a lightweight and general module, it is end-to-end trainable along with base CNNs. We validate our AFPN through extensive experiments on VOC and COCO detection datasets. Our experiments show consistent improvements in detection performances.
- Research Article
13
- 10.1080/01431161.2023.2261153
- Oct 2, 2023
- International Journal of Remote Sensing
Land Use/Land Cover (LULC) classification has become increasingly important in various fields, including ecological and environmental protection, urban planning, and geological disaster monitoring. With the development of high-resolution remote sensing satellite technology, there is a growing focus on achieving precise LULC classification. However, the accuracy of fine-grained LULC classification is challenged by the high intra-class diversity and low inter-class separability inherent in high-resolution remote sensing images. To address this challenge, this paper proposes a novel multi-path feature fusion semantic segmentation model, called MPFFNet, which combines the segmentation results of convolutional neural networks with traditional filtering processes to achieve finer LULC classification. MPFFNet consists of three modules: the Improved Encoder Module (IEM) extracts contextual and spatial detail information through the backbone network, DASPP, and MFEAM; the Improved Decoder Module (IDM) utilizes the Cascade Feature Fusion (CFF) module to effectively merge shallow and deep information; and the Feature Fusion Module (FAM) enables dual-path feature fusion using a convolutional neural network and Gabor Filter. Experimental results on the large-scale classification set and the fine land-cover classification set of the Gaofen Image Dataset (GID) demonstrate the effectiveness of the proposed method, achieving mIoU scores of 81.02% and 77.83%, respectively. These scores outperform U-Net by 7.95% and 3.28%, respectively. Therefore, we believe that our model can deliver superior results in the task of LULC classification.
- Research Article
- 10.3390/s24144545
- Jul 13, 2024
- Sensors (Basel, Switzerland)
Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot establish global dependencies on the feature maps of a search area. This paper proposes a dynamic cascade feature fusion (DCFF) module by introducing a local feature guidance (LFG) module and dynamic attention modules (DAMs) after the depth-wise cross-correlation module to enhance the global dependency modeling capability during the feature fusion process. In this paper, a set of verification experiments is designed to investigate whether establishing global dependencies for the features output by the depth-wise cross-correlation operation can significantly improve the performance of fully convolutional trackers based on a Siamese network, providing experimental support for rational design of the structure of a dynamic cascade feature fusion module. Secondly, we integrate the dynamic cascade feature fusion module into the tracking framework based on a Siamese network, propose SiamDCFF, and evaluate it using public datasets. Compared with the baseline model, SiamDCFF demonstrated significant improvements.
- Conference Article
136
- 10.1109/cvpr46437.2021.01509
- Jun 1, 2021
Learning pyramidal feature representations is crucial for recognizing object instances at different scales. Feature Pyramid Network (FPN) is the classic architecture to build a feature pyramid with high-level semantics throughout. However, intrinsic defects in feature extraction and fusion inhibit FPN from further aggregating more discriminative features. In this work, we propose Attention Aggregation based Feature Pyramid Network (A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -FPN), to improve multi-scale feature learning through attention-guided feature aggregation. In feature extraction, it extracts discriminative features by collecting-distributing multi-level global context features, and mitigates the semantic information loss due to drastically reduced channels. In feature fusion, it aggregates complementary information from adjacent features to generate location-wise reassembly kernels for content-aware sampling, and employs channel-wise reweighting to enhance the semantic consistency before element-wise addition. A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -FPN shows consistent gains on different instance segmentation frameworks. By replacing FPN with A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -FPN in Mask R-CNN, our model boosts the performance by 2.1% and 1.6% mask AP when using ResNet-50 and ResNet-101 as backbone, respectively. Moreover, A <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
- Research Article
3
- 10.1142/s0218213022500282
- Mar 1, 2022
- International Journal on Artificial Intelligence Tools
This paper proposes a high-performance framework for accurate multi-stage object detection in low-altitude based UAV images. The proposed system employs a cascade style architecture with increasing thresholds for achieving accurate detection. The framework makes use of highly efficient Feature Pyramid Networks (FPNs) to detect objects of small sizes, and various scales which are the main challenge in low-altitude aerial images. FPNs aim to resolve scale variation problems in object detection by combining features of multiple levels. The experiments have been performed on a complex low-altitude aerial dataset VisDrone which has multiple categories of classes. The FPN-Cascade detector has been supported by slicing the data horizontally and vertically that resulted in an advancement of 8% mAP when compared with the base detector. The experiments compare the FPN-Cascade performance on the standard as well as augmented VisDrone dataset. A concrete methodology about the training process, hyperparameter tuning, and performance evaluation methods for Cascade RCNN on the VisDrone dataset is highlighted. The proposed framework achieves state of the art 30.04% mAP value on the VisDrone dataset.
- Research Article
14
- 10.1016/j.neucom.2024.127809
- May 9, 2024
- Neurocomputing
Weighted parallel decoupled feature pyramid network for object detection