Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN

Chenchen Xu,Yu Li,Guili Wang,Baojun Zhang,Shu Dai,Jianghua Yu,Lin Xu,Songsong Yan

doi:10.1155/2020/5761414

Abstract

This study presents a simple and effective Mask R-CNN algorithm for more rapid detection of vehicles and pedestrians. The method is of practical value for anticollision warning systems in intelligent driving. Deep neural networks with more layers have greater capacity but also have to perform more complicated calculations. To overcome this disadvantage, this study adopts a Resnet-86 network as a backbone that differs from the backbone structure of Resnet-101 in the Mask R-CNN algorithm within practical conditions. The results show that the Resnet-86 network can reduce the operation time and greatly improve accuracy. The detected vehicles and pedestrians are also screened out based on the Microsoft COCO dataset. The new dataset is formed by screening and supplementing COCO dataset, which makes the training of the algorithm more efficient. Perhaps, the most important part of our research is that we propose a new algorithm, Side Fusion FPN. The parameters in the algorithm have not increased, the amount of calculation has increased by less than 0.000001, and the mean average precision (mAP) has increased by 2.00 points. The results show that, compared with the algorithm of Mask R-CNN, our algorithm decreased the weight memory size by 9.43%, improved the training speed by 26.98%, improved the testing speed by 7.94%, decreased the value of loss by 0.26, and increased the value of mAP by 17.53 points.

Highlights

To improve driving safety and reduce driver fatigue, research is being conducted on the development of intelligent driving technology [1]
Machine learning approaches first define features using one of the feature acquisition descriptors such as histogram of oriented gradient (HOG) [6] and perform classification using a technique such as a support vector machine (SVM) [7]. e HOG + SVM approach shows superior performance but suffers from low mean average precision and is not suitable for multistage process feature extraction [8]. Deep learning systems, such as convolutional neural networks (CNNs), show superiority in object detection because they aim to discover discriminative features from raw data [9]. e CNN was developed in the 1980s and 1990s [10], but since experiencing a resurgence of interest [11] in 2012, it has established a foothold in the field of computer vision and has grown at a rapid pace
Based on Mask R-CNN, we propose a method to improve the detection of accuracy and speed through SF-feature pyramid network (FPN) with Resnet-86

Summary

Research Article

Is study presents a simple and effective Mask R-CNN algorithm for more rapid detection of vehicles and pedestrians. Deep neural networks with more layers have greater capacity and have to perform more complicated calculations. To overcome this disadvantage, this study adopts a Resnet-86 network as a backbone that differs from the backbone structure of Resnet-101 in the Mask R-CNN algorithm within practical conditions. E results show that, compared with the algorithm of Mask R-CNN, our algorithm decreased the weight memory size by 9.43%, improved the training speed by 26.98%, improved the testing speed by 7.94%, decreased the value of loss by 0.26, and increased the value of mAP by 17.53 points The most important part of our research is that we propose a new algorithm, Side Fusion FPN. e parameters in the algorithm have not increased, the amount of calculation has increased by less than 0.000001, and the mean average precision (mAP) has increased by 2.00 points. e results show that, compared with the algorithm of Mask R-CNN, our algorithm decreased the weight memory size by 9.43%, improved the training speed by 26.98%, improved the testing speed by 7.94%, decreased the value of loss by 0.26, and increased the value of mAP by 17.53 points

Introduction

Coordinates Category

Featurized image pyramid

RPN Conv