Abstract

The performance of convolutional neural network- (CNN-) based object detection has achieved incredible success. Howbeit, existing CNN-based algorithms suffer from a problem that small-scale objects are difficult to detect because it may have lost its response when the feature map has reached a certain depth, and it is common that the scale of objects (such as cars, buses, and pedestrians) contained in traffic images and videos varies greatly. In this paper, we present a 32-layer multibranch convolutional neural network named MBNet for fast detecting objects in traffic scenes. Our model utilizes three detection branches, in which feature maps with a size of 16 × 16, 32 × 32, and 64 × 64 are used, respectively, to optimize the detection for large-, medium-, and small-scale objects. By means of a multitask loss function, our model can be trained end-to-end. The experimental results show that our model achieves state-of-the-art performance in terms of precision and recall rate, and the detection speed (up to 33 fps) is fast, which can meet the real-time requirements of industry.

Highlights

  • Detecting various objects in images or videos from traffic scenes is a basic premise for many intelligent transportation systems

  • Representative algorithms include RCNN [4], fast RCNN [7], faster RCNN [9], and mask RCNN [13], and they are typical two-stage methods. e other is the object detection algorithm based on the regression method, which deals with the detection problem as a regression problem and directly predicts the location and classification of the objects. ese kinds of methods are typical one-stage methods, and they are fast, but the accuracy is relatively lower than the two-stage

  • (2) A scale-aware mechanism is proposed to adjusting the weights performing detection accurately for large, medium, and small-scale objects from various traffic scenes, which achieves better performance compared with other methods in terms of precision and recall rate and is able to meet the real-time requirements of application

Read more

Summary

Introduction

Detecting various objects (such as vehicles and pedestrians) in images or videos from traffic scenes is a basic premise for many intelligent transportation systems. Reasonable traffic management and control based on the movement of vehicles and pedestrians can reduce the occurrence of traffic accidents, road congestion, etc In this regard, considerable efforts have been made over the past decade. One of the most popular object detection methods is using sliding windows to generate candidate regions, features can be extracted from these regions and pretrained classifiers are applied to determine if these regions have certain objects or not. It leads to the huge computational cost. Representative algorithms include RCNN [4], fast RCNN [7], faster RCNN [9], and mask RCNN [13], and they are typical two-stage methods (which generate the proposals using a region generation method and classify and regress the proposals). e other is the object detection algorithm based on the regression method, which deals with the detection problem as a regression problem and directly predicts the location and classification of the objects. ese kinds of methods are typical one-stage methods, and they are fast, but the accuracy is relatively lower than the two-stage

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call