Real Time Object Detection with Audio Feedback using Yolo vs. Yolo_v3

Mansi Mahendru,Sanjay Kumar Dubey

doi:10.1109/confluence51648.2021.9377064

Abstract

Object recognition is one of the challenging application of computer vision, which has been widely applied in many areas for e.g. autonomous cars, Robotics, Security tracking, Guiding Visually Impaired Peoples etc. With the rapid development of deep learning many algorithms were improving the relationship between video analysis and image understanding. All these algorithms work differently with their network architecture but with the same aim of detecting multiple objects within complex image. Absence of vision impairment restraint the movement of the person in an unfamiliar place and hence it is very essential to take help from our technologies and trained them to guide blind peoples whenever they need. This paper proposes a system that will detect every possible day to day multiple objects on the other hand promt a voice to alert person about the near as well as farthest objects around them. In this paper system is developed using two different algorithms i.e. Yolo and Yolo_v3 and tested under same criterias to measure the accuracy and performance. In Yolo Tensor flow _SSD _Mobile Net model and in Yolo_v3 Dark net model is used. To get the audio Feedback gTTS (Google Text to Speech), python library used to convert statements into audio speech. To play the audio pygame python module is used. Testing of both the algorithms is done on MS-COCO Dataset consist of more than 200 K images. Both the algorithms are analysed using webcam in various situations to measure accuracy of the algorithm in every possibility.

Full Text