FPDT: a multi-scale feature pyramidal object detection transformer

Kailai Huang,Mi Wen,Lina Ling,Chen Wang

doi:10.1117/1.jrs.17.026510

Abstract

Object detection is a fundamental part of autonomous driving algorithms, and with the promotions of transformers in a couple of years, numerous computer vision tasks are integrating transformers into object detectors to acquire a better generalization ability. Building a pure transformer-based detector seems to be a wonderful choice; however, transformers are not omnipotent, and they come with painful drawbacks. Its fundamental operator, multi-head self-attention (MHSA), suffers from the need for computational resources due to its quadratic complexity, which demands an unreasonably high memory usage and critically low throughput. To address this issue, we use a convolution operation to simulate MHSA from transformers by referencing the philosophy and principle of MHSA and making an application migration on convolutional neural networks (CNNs). This gives a detector with power and speed simultaneously. Furthermore, a multi-scale pyramidal feature extractor gives the detector a better view over various scales. In general, our proposed object detector mainly follows the philosophy of attention mechanism, which is implemented by a multi-scale feature pyramidal CNN encoder that simulates the transformer, and a real transformer query neck to extract all of the objects once and, eventually, feed them to the output heads. After training on the COCO2017 dataset, by combining the construction philosophy of the object detector and the philosophy and characteristics of the transformer, our FPDT-Tiny gives an average precision (AP) of up to 34.1 in 150 lower epochs, which is 16.0 and 10.8 higher than CNN-based YOLOv3-Base and SSD-300, respectively. Also, the AP given by our FPDT-Small is up to 37.7 under the same epoch, which is 10.4 and 7.9 higher than the transformer-based detector YOLOS-Small and DETR-ResNet-152, respectively, also demonstrating a comparable performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FPDT: a multi-scale feature pyramidal object detection transformer

Abstract

Talk to us

Similar Papers

More From: Journal of Applied Remote Sensing

Lead the way for us

Journal: Journal of Applied Remote Sensing	Publication Date: May 18, 2023
Citations: 2

Similar Papers

A Dense-Aware Cross-splitNet for Object Detection and Recognition
Sheng-Ye Wang ... Cui-Jin Li
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33
Sheng-Ye Wang, et. al.Sheng-Ye Wang ... Cui-Jin Li
01 May 2023
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 33

A DROPOUT TECHNIQUE STUDY FOR THE FASTER R-CNN DETECTORS WITH PRETRAINED CONVOLUTIONAL NEURAL NETWORKS FOR DETECTING VERY SIMPLE OBJECTS THAT CAN BE MASKED
V Romanuke
Application Mathematics and Informatics | VOL. 26
V RomanukeV Romanuke
01 Jan 2018
Application Mathematics and Informatics | VOL. 26

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection
Yoo Jin Hyeok ... Kum Dongsuk
-
Yoo Jin Hyeok, et. al.Yoo Jin Hyeok ... Kum Dongsuk
10 Jan 2021
10 Jan 2021

Understanding of Object Detection Based on CNN Family and YOLO
Juan Du
Journal of Physics: Conference Series | VOL. 1004
Juan DuJuan Du
01 Apr 2018
Journal of Physics: Conference Series | VOL. 1004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FPDT: a multi-scale feature pyramidal object detection transformer

Abstract

Talk to us

Similar Papers

More From: Journal of Applied Remote Sensing