Abstract

Localizing moments in the videos has been a new challenging task in the field of Computer Science to provide faster search time for video retrieval, query processing and also behavioral analysis. The process involves stages such as video understanding, video segmentation, query processing using NLP and generation of localization of the concepts in the video. Though there have been many attempts to Video understanding in the field of NLP and Computer Vision in past years, they lack to cover the large untrimmed videos in current real-life scenarios. We propose the deep learning-based solution with the use of Random Forest and Bi-LSTM approach to localize the labels in segments and also the time at which they pertain to the particular segments. We used the YouTube 8M dataset provided by YouTube in Kaggle's challenge to train our frame-based model and use it to classify the segments using sliding windows of size 5. Our approach tries to provide a naïve and robust approach to model this concept and provide a way to tackle this large problem. Further improvements in the Bi-LSTM based models and Random Forest models with VLAD would lead to better results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.