Hierarchical Deep Learning for Event-Based temporal Segmentation

Mr.Rajan R,Sathish V,Hemachandran S,Deepak K,Sathyanarayanan S

doi:10.55041/ijsrem39606

Mr.Rajan R, Sathish V + Show 3 more

Open Access

PDF Available

https://doi.org/10.55041/ijsrem39606

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract—This paper proposes a sophisticated workflow designed to enhance the interaction between users and long-form video content, particularly aiding in efficient video segment retrieval based on specific user queries. The proposed workflow eliminates the need for manual video browsing by automatically identifying and returning relevant timestamps for requested actions or events. By structuring the query as a Hierarchical Query Processor, which decomposes user requests into temporally dependent sub-queries, and incorporating a Timestamp-Aware Frame Encoder to associate visual frames with precise timestamps, the system effectively models video content for time-sensitive retrieval. The following methods are integrated to optimize performance: Sliding Video Q-Former to capture temporal relationships across frames, Temporal Attention Cache for efficient reuse of pre-computed attention patterns, and a Language Model to process queries and generate precise timestamped responses. This innovation holds particular value for applications in instructional media, surveillance analysis, and content search, where time-sensitive accuracy and contextual understanding are crucial. Keywords—Video Comprehension – Temporal Localization – Hierarchical Query Processing – Timestamp Embedding – Attention Caching – Large Language Models (LLMs)

Full Text