OBJECT DETECTION AND TEXT TO SPEECH CONVERSION BASED ON YOLOV7 USING DEEP LEARNING

Kota Tejaswini,Kethavathu Lakshmi Bai,Konda Sai Chaithanya

doi:10.55041/ijsrem18807

Abstract

Abstract—Object detection is a computer vision technique that locates objects in images or videos by creating bounding boxes around them. In this paper, we propose a model based on object detection using deep learning technologies along with text to speech conversion.An object detection system uses a deep learningmodel to detect objects using YOLO (You Only Look Once) and text-to- speech (TTS) to synthesize a voice announcement about each object. The system we used is built using python OpenCV tool and Google text to speech (gTTS) is used to convert text into audio segment. First variations of YOLO algorithm are compared and then the best one is used according to result we get it by training it on COCO dataset. After the object is detected, the name of the detected object is displayed then the voice output is generated by using Google Text To Speech(gTTS) module. The contribution we make is to present a visual substitution system that uses features extraction and matching to recognize objects with a voice feedback. Index Terms—Object Detection, YOLO, Open CV, python, Google Text To Speech

Full Text