Abstract

The vision-based smart driving technologies for road safety are the popular research topics in computer vision. The precise moving object detection with continuously tracking capability is one of the most important vision-based technologies nowadays. In this paper, we propose an improved object detection system, which combines a typical object detector and long short-term memory (LSTM) modules, to further improve the detection performance for smart driving. First, starting from a selected object detector, we combine all vehicle classes and bypassing low-level features to improve its detection performance. After the spatial association of the detected objects, the outputs of the improved object detector are then fed into the proposed double-layer LSTM (dLSTM) modules to successfully improve the detection performance of the vehicles in various conditions, including the newly-appeared, the detected and the gradually-disappearing vehicles. With stage-by-stage evaluations, the experimental results show that the proposed vehicle detection system with dLSTM modules can precisely detect the vehicles without increasing computations.

Highlights

  • Accurate object detection is a challenging problem in computer vision for many years

  • We suggest that the object detection system can combine the long short-term memory (LSTM) modules [29] to improve its detection performance

  • Since we focus on the vehicle detection and tracking on Taiwanese highway traffic roads, we labeled cars, buses and trucks as a single vehicle class

Read more

Summary

Introduction

Accurate object detection is a challenging problem in computer vision for many years. For performance improvement and computation reduction, the proposed iYOLO object detector is designed to classify 30 onroad moving objects with one combined-vehicle class, including car, bus, and truck classes together. If CibPb passes a pre-defined threshold, we will consider the ith class object being existed in the bth bounding box and detected by the proposed iYOLO detector. The prediction of the module is composed of three loss functions: (1) location loss of the bounding box, g = {x, y, w, h} , where (x, y) denotes the center position while w and h respectively represent width and height of the bounding box; (2) classification loss defined by the conditional probability for specific class, ps(i) ; and (3) confidence loss related to probability Ps(o) that states an object existing in the sth grid cell.

LSTM refiner
Results and discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call