Abstract

Video moment retrieval locates a specified moment by a sentence query. Recent approaches have made remarkable advancements with large-scale video-sentence annotations. These annotations require extensive human labor and expertise, leading to the need for unsupervised fashion. Generating pseudo-supervision from videos is an effective strategy. With the power of the large-scale pre-trained model, we introduce knowledge into constructing pseudo-supervision. The main technical challenge is improving pseudo-supervision diversity and alleviating noise brought by external knowledge. To address these problems, we propose two Knowledge-based Pseudo Supervision Construction (KPSC) strategies: KPSC-P and KPSC-F. They all follow two steps: generating diverse samples and alleviating knowledge chaos. The main difference is that the former first learns a representation space with prompt tuning, while the latter directly utilizes data information. KPSC-P has two modules: 1) Proposal Prompt (PP): Generate temporal proposals; 2) Verb Prompt (VP): Generate pseudo-queries with noun-verb patterns. KPSC-F also has two modules: 1) Captioner: Generating candidate queries; 2) Filter: Alleviating knowledge chaos. Thus, our KPSC involves two attempts to extract knowledge from pre-trained models. Extensive experiments show that our attempts outperform the existing unsupervised methods on two public datasets (Charades-STA and ActivityNet-Captions) and perform on par with several methods using stronger supervision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.