Dual Architecture Deep Learning Based Object Detection System for Autonomous Driving

Mahmoud M. Mahmoud ,Ahmed R. Nasser

doi:10.33103/uot.ijccce.21.2.3

Abstract

Object detection of autonomous vehicles presents a big challenge for researchers due to the requirements of accuracy and precision in real-time. This work presents a deep learning approach based on a dual architecture design of the network. A highly accurate multi-class network of convolutional neural networks (CNN) is presented for input data classification. A Region-Based Convolutional Neural Networks (Faster R-CNN) network with a modified Feature Pyramid Networks (FPN) is used for better detection of tiny objects and You Only Look Once (YOLOv3) network is used for general detection. Each network independently detects the existence of an object. The decision maps are then fused and compared to decide whether an object is present or not. Faster R-CNN with FPN model reported a higher intersection over Union (IoU) and mean average precision (mAP) than the YOLOv3. This approach is reliable demonstrating an upgrade on the existing state-of-the-art methods of fully connected networks. Index Terms— autonomous driving, computer vision, deep learning, object detection

Highlights

Interest in autonomous driving has grown enormously [1] due to the rise of deep learning and the progress of computer software, hardware, and processing power
The backbone learns the features of the image based on the Convolutional Neural Network (CNN) architecture whereas the detection head predicts the bounding boxes based on these features
The main types of object detectors [4] usually are either two-stage approaches like Region-Based Convolutional Neural Networks (Faster R-CNN) [5] and Region-based Fully Convolutional Networks (R-FCN) [6] or single-shot detectors such as You Only Look Once (YOLO) [7-8] and Single Shot Detector (SSD) [9], the first is more accurate while the latter is generally faster

Summary

INTRODUCTION

Interest in autonomous driving has grown enormously [1] due to the rise of deep learning and the progress of computer software, hardware, and processing power. Object detection uses deep Convolutional Neural Networks (CNN) to extract features because of the CNN features’ discriminative representations. The backbone learns the features of the image based on the Convolutional Neural Network (CNN) architecture whereas the detection head predicts the bounding boxes based on these features. The main types of object detectors [4] usually are either two-stage approaches like Region-Based Convolutional Neural Networks (Faster R-CNN) [5] and Region-based Fully Convolutional Networks (R-FCN) [6] or single-shot detectors such as You Only Look Once (YOLO) [7-8] and Single Shot Detector (SSD) [9], the first is more accurate while the latter is generally faster. An architecture based on YOLOv3 for general detection and Faster R-CNN with modified Feature Pyramid Network (FPN) [10]-[11] for the detection of tiny objects. The rest of the paper is structured as follows: Section 2 describes the related work, section 3 provides the architecture of the model and the implemented state-of-the-art method, section 4 outlines the experimental results and evaluation of the model, and the final section concludes the paper with discussion and future applications

RELATED WORD

PROPOSED ARCHITECTURE

You Only Look Once (YOLO)

Feature Maps Fusion

EXPERIMENTAL RESULTS

CONCLUSION