Multi-scale Feature Pyramid Research Articles

Marine object detection is an appealing but challengeable task in computer vision. Even though recent popular object detection algorithms perform well on common classes, they cannot acquire satisfied detection performance on marine objects because underwater images are affected by color cast and blur, and scales of the target in underwater images are usually small. These phenomena aggravate the difficulty of detection. Thus, it is urgent to design a proper structure to settle marine object detection issues. To this end, this paper proposes a novel scale-aware feature pyramid architecture named SA-FPN to extract abundant robust features on underwater images and improve the performance on marine object detection. Specifically, we design a special backbone subnetwork to improve the ability of feature extraction, which could provide richer fine-grained features for small object detection. What is more, this paper proposes a multi-scale feature pyramid to enrich the semantic features for prediction. Each feature map is enhanced by the higher level layer with context information through a top-down upsampling pathway. Through obtaining ample feature maps on underwater images, our algorithm could generate multiple bounding boxes for each target. To mitigate the reduplicative boxes and avoid miss suppression, we replace the non-maximum suppression method with soft non-maximum suppression. In this paper, we evaluate our algorithm on underwater image datasets and achieve 76.27% mAP. Meanwhile, we conduct experiments on PASCAL VOC datasets and smart unmanned vending machines datasets and get 79.13% mAP and 91.81% mAP, respectively. The experimental results reveal that our approach achieves best performance not only on marine object detection, but also on common classes.

Read full abstract

In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.

Read full abstract

Multi-scale Feature Pyramid Research Articles

Related Topics

Articles published on Multi-scale Feature Pyramid

MLFFNet: Multilevel Feature Fusion Network for Object Detection in Sonar Images

A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation

CLT-Det: Correlation Learning Based on Transformer for Detecting Dense Objects in Remote Sensing Images

Adaptive Pyramid Context Fusion for Point Cloud Perception

MBANet: Multi-branch aware network for kidney ultrasound images segmentation

DDNet: 3D densely connected convolutional networks with feature pyramids for nasopharyngeal carcinoma segmentation

MFP‐Net: Multi‐scale feature pyramid network for crowd counting

FAGNet: Multi-Scale Object Detection Method in Remote Sensing Images by Combining MAFPN and GVR

Rotational multipyramid network with bounding‐box transformation for object detection

Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet.

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

Multi-Scale Attention Network for Diabetic Retinopathy Classification

An Improved Faster R-CNN for Pulmonary Embolism Detection From CTPA Images

PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

Scattering Enhanced Attention Pyramid Network for Aircraft Detection in SAR Images

Scale-aware feature pyramid architecture for marine object detection

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

Visual Object Tracking Based on Mutual Learning Between Cohort Multiscale Feature-Fusion Networks With Weighted Loss

Automated mammographic mass detection using deformable convolution and multiscale features.

Rethinking the Bottom-Up Framework for Query-Based Video Localization

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-scale Feature Pyramid Research Articles

Related Topics

Articles published on Multi-scale Feature Pyramid

MLFFNet: Multilevel Feature Fusion Network for Object Detection in Sonar Images

A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation

CLT-Det: Correlation Learning Based on Transformer for Detecting Dense Objects in Remote Sensing Images

Adaptive Pyramid Context Fusion for Point Cloud Perception

MBANet: Multi-branch aware network for kidney ultrasound images segmentation

DDNet: 3D densely connected convolutional networks with feature pyramids for nasopharyngeal carcinoma segmentation

MFP‐Net: Multi‐scale feature pyramid network for crowd counting

FAGNet: Multi-Scale Object Detection Method in Remote Sensing Images by Combining MAFPN and GVR

Rotational multipyramid network with bounding‐box transformation for object detection

Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet.

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

Multi-Scale Attention Network for Diabetic Retinopathy Classification

An Improved Faster R-CNN for Pulmonary Embolism Detection From CTPA Images

PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

Scattering Enhanced Attention Pyramid Network for Aircraft Detection in SAR Images

Scale-aware feature pyramid architecture for marine object detection

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

Visual Object Tracking Based on Mutual Learning Between Cohort Multiscale Feature-Fusion Networks With Weighted Loss

Automated mammographic mass detection using deformable convolution and multiscale features.

Rethinking the Bottom-Up Framework for Query-Based Video Localization