Abstract

Object detection in images, videos, and text detection using audio files becomes more complex. Classification of image objects from the static images I,e detecting the objects in static images, and detecting the object with a specific boundary box is called object localization. This can apply to videos also. Extracting the text from audio files is part of multimodal. Many issues are identified from extracting the required data from this entire multimodal. In this paper, the two-way object detection and localization with Ensemble Object Detection and Localization framework (EODLF) are introduced to overcome significant issues such as overlapping in object detection. This consists of two algorithms, one for object detection from static images using accurate bounding boxes for objects. To improve object detection, rapid simultaneous localization and mapping (RSLAM) are introduced to improve the accuracy. To detect objects in videos, images and recognize the text from the audios by using advanced short and long-range object linking which is called Advanced Tubelet Rendering integrated with Region Proposal Network (RPN) is used to detect objects with bounding boxes with high accuracy such as 98.78 and also this shows the count of the objects and names of the objects in the videos and images. Another advantage of this system is to detect the frequent words in audio also. The performance of the proposed approach is compared with various state-of-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call