Object Detection through Modified YOLO Neural Network

Tanvir Ahmad,Shah Nazir,Muhammad Yahya,Belal Ahmad,Amin Ul Haq,Yinglong Ma

doi:10.1155/2020/8403262

Abstract

In the field of object detection, recently, tremendous success is achieved, but still it is a very challenging task to detect and identify objects accurately with fast speed. Human beings can detect and recognize multiple objects in images or videos with ease regardless of the object’s appearance, but for computers it is challenging to identify and distinguish between things. In this paper, a modified YOLOv1 based neural network is proposed for object detection. The new neural network model has been improved in the following ways. Firstly, modification is made to the loss function of the YOLOv1 network. The improved model replaces the margin style with proportion style. Compared to the old loss function, the new is more flexible and more reasonable in optimizing the network error. Secondly, a spatial pyramid pooling layer is added; thirdly, an inception model with a convolution kernel of 1 ∗ 1 is added, which reduced the number of weight parameters of the layers. Extensive experiments on Pascal VOC datasets 2007/2012 showed that the proposed method achieved better performance.

Highlights

Human beings can detect and identify objects in their surroundings, without consideration of their circumstances, no matter what position they are in and whether they are upside down, different in color or texture, partly occluded, etc. erefore, humans make object detection look trivial.e same object detection and recognition with a computer require a lot of processing to extract some information on the shapes and objects in a picture.In computer vision, object detection refers to finding and identifying an object in an image or video. e main steps involved in object detection include feature extraction [1], feature processing [2,3,4], and object classification [5]
Loss function can be described in five parts: the first and second are focusing on the loss of the bounding box coordinates, while the third and fourth are responsible for the difference in the confidence of having an object in the grid, and part five is responsible for the difference in class probability. e λcoord and λnoobj are scalars to weight each loss function, Symbol/ notation λcoord λnoobj
Overall, the use of the new network to extract the characteristics is very effective and robust, but it is inadequate and needs to be further improved. e improved network was tested on Pascal VOC 2007 and Pascal VOC 2012, respectively. e results are shown in Tables 3 and 4

Summary

Introduction

Human beings can detect and identify objects in their surroundings, without consideration of their circumstances, no matter what position they are in and whether they are upside down, different in color or texture, partly occluded, etc. erefore, humans make object detection look trivial.e same object detection and recognition with a computer require a lot of processing to extract some information on the shapes and objects in a picture.In computer vision, object detection refers to finding and identifying an object in an image or video. e main steps involved in object detection include feature extraction [1], feature processing [2,3,4], and object classification [5]. E same object detection and recognition with a computer require a lot of processing to extract some information on the shapes and objects in a picture. E main steps involved in object detection include feature extraction [1], feature processing [2,3,4], and object classification [5]. E feature extraction plays an essential role in the object detection and recognition process [6]. Various techniques have been used to detect the object accurately and efficiently for different applications. These proposed methods still have problems with a lack of accuracy and efficiency. To tackle these problems of the object detection, machine learning and deep neural network methods are more effective in correcting object detection

Methods

Results

Conclusion