Mixed YOLOv3-LITE: A Lightweight Real-Time Object Detection Method.

Haipeng Zhao,Xinyue Cai,Long Zhang,Xiaofei Hu,Yangzhao Peng,Haojie Peng,Yang Zhou

doi:10.3390/s20071861

Haipeng Zhao, Xinyue Cai + Show 5 more

Open Access

PDF Available

https://doi.org/10.3390/s20071861

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Embedded and mobile smart devices face problems related to limited computing power and excessive power consumption. To address these problems, we propose Mixed YOLOv3-LITE, a lightweight real-time object detection network that can be used with non-graphics processing unit (GPU) and mobile devices. Based on YOLO-LITE as the backbone network, Mixed YOLOv3-LITE supplements residual block (ResBlocks) and parallel high-to-low resolution subnetworks, fully utilizes shallow network characteristics while increasing network depth, and uses a “shallow and narrow” convolution layer to build a detector, thereby achieving an optimal balance between detection precision and speed when used with non-GPU based computers and portable terminal devices. The experimental results obtained in this study reveal that the size of the proposed Mixed YOLOv3-LITE network model is 20.5 MB, which is 91.70%, 38.07%, and 74.25% smaller than YOLOv3, tiny-YOLOv3, and SlimYOLOv3-spp3-50, respectively. The mean average precision (mAP) achieved using the PASCAL VOC 2007 dataset is 48.25%, which is 14.48% higher than that of YOLO-LITE. When the VisDrone 2018-Det dataset is used, the mAP achieved with the Mixed YOLOv3-LITE network model is 28.50%, which is 18.50% and 2.70% higher than tiny-YOLOv3 and SlimYOLOv3-spp3-50, respectively. The results prove that Mixed YOLOv3-LITE can achieve higher efficiency and better performance on mobile terminals and other devices.

Highlights

Object detection based on convolutional neural networks has been a popular research topic in the field of computer vision with a focus on object location and classification
Based on Trial 12, one layer of ResBlock was added before the output of the three-scale feature maps in Trial 13, and the mean average precision (mAP), recall rate, and F1 score of the model increased by approximately 0.8%
The experiment divided into two parts: (1) training using subset A and testing using subset B; (2) training using subset was divided into two parts: (1) training using subset A and testing using subset B; (2) training using

Summary

Introduction

Object detection based on convolutional neural networks has been a popular research topic in the field of computer vision with a focus on object location and classification. Feature extraction and classification of original images can be conducted via multi-layer convolution operations, and the position of an object in an image can be predicted using boundary boxes, providing the capability of visual understanding. The results of these studies can be widely applied in facial recognition [1], attitude prediction [2], video surveillance, and a variety of other intelligent applications [3,4,5]. There has been significant development in fast object detection methods [6,7,8]; it is still inconvenient to implement convolutional neural network structures in non-graphics processing unit (GPU) or mobile devices. With the growth in the development of embedded and mobile intelligent devices with limited

Results

Discussion

Conclusion