SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Qing Li,Wei He,Yingcheng Lin

doi:10.3390/app11031096

Abstract

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Highlights

As one of the fundamental visual recognition problems of computer vision, object detection is the basis of many other computer vision tasks, such as instance segmentation [1,2] and object tracking [3]
We propose a lightweight real-time object detection network SSD7FFAM for embedded devices that can be used in specific scenarios from scratch
The proposed novel feature fusion and attention mechanism method can effectively improve the accuracy of object detection

Summary

Introduction

As one of the fundamental visual recognition problems of computer vision, object detection is the basis of many other computer vision tasks, such as instance segmentation [1,2] and object tracking [3]. In order to improve the detection accuracy, most research focuses on the design of increasingly complex object detectors such as R-CNN [1], Single-Shot MultiBox Detector (SSD) [4], You Only Look Once (YOLO) [5], and their variants [2,6,7,8,9]. They have achieved high detection accuracies, such object detection networks are usually challenging to handle for embedded devices due to computational and memory limitations. The design and development of more efficient deep neural networks for real-time embedded object detection are highly expected

Methods

Results

Conclusion