Abstract

Object Detection systems have been growing in the last few years for various applications. Since the hardware can not detect the smallest objects. Many algorithms are used for object detection like Yolo, R-CNN, Fast R-CNN, Faster R-CNN, etc. object detection using YOLO is faster than other algorithms and the YOLO scans the whole image completely at one time. Object detection, which is based on Convolutional Neural Networks (CNNs) and it's based on classification and localization. An object is detected by extracting the features of an object like the color of the object, the texture of the object or shape, or some other features. Then based on these features, objects are classified into many classes and each class is assigned a label. When we subsequently provide an image to the model, it will output many objects it detects, the location of a bounding box that contains every object with their label and score indicates the confidence. Text-To-Speech (TTS) conversion is a computer-based systemthat requires for the label are converted text-to-speech. The main motive is that the smallest amount of objects can be detected object and labeling the object with voice for real-time object detection. The final model architecture proposed is more accurate and provides the fast result of object detection with voice as compared to previous researches

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.