Abstract

The Single Shot MultiBox Detector (SSD) is one of the fastest algorithms in the current target detection field. It has achieved good results in target detection but there are problems such as poor extraction of features in shallow layers and loss of features in deep layers. In this paper, we propose an accurate and efficient target detection method, named Single Shot Object Detection with Feature Enhancement and Fusion (FFESSD), which is to enhance and exploit the shallow and deep features in the feature pyramid structure of the SSD algorithm. To achieve it we introduced the Feature Fusion Module and two Feature Enhancement Modules, and integrated them into the conventional structure of the SSD. Experimental results on the PASCAL VOC 2007 dataset demonstrated that FFESSD achieved 79.1% mean average precision (mAP) at the speed of 54.3 frame per second (FPS) with the input size 300 × 300, while FFESSD with a 512 × 512 sized input achieved 81.8% mAP at 30.2 FPS. The proposed network shows state-of-the-art mAP, which is better than the conventional SSD, Deconvolutional Single Shot Detector (DSSD), Feature-Fusion SSD (FSSD), and other advanced detectors. On extended experiment, the performance of FFESSD in fuzzy target detection was better than the conventional SSD.

Highlights

  • Target detection is one of the main tasks of computer vision, and it is extensively used in areas such as driverless cars, face recognition, road detection, medical image processing, and human–computer interaction

  • We show that FFESSD achieves state-of-the-art results on the Pascal VOC at a real time processing speed, and the performance of FFESSD in fuzzy target detection is better than the conventional Single Shot MultiBox Detector (SSD)

  • HyperNet [30], Online hard example mining (OHEM) [31], and ION [32] have problems of poor accuracy and real-time performance, making it difficult to satisfy the needs of real-time detection of complex large data sets

Read more

Summary

Introduction

Target detection is one of the main tasks of computer vision, and it is extensively used in areas such as driverless cars, face recognition, road detection, medical image processing, and human–computer interaction. The traditional target detection methods such as Local Binary Patterns (LBP) [1], Scale Invariant Feature Transforms (SIFT) [2], Histograms of Oriented Gradient (HOG) [3], and Haar-like (Haar) [4], are based on hand-crafted features. This feature extracted by the traditional target detection methods has obvious limitations. The feature extraction is complex and the calculation speed is slow. It is difficult to satisfy the needs of real-time detection on a complex and large dataset

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call