Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

Xun Yang,Tat-Seng Chua,Yixin Cao,Meng Wang,Xun Wang,Jianfeng Dong

doi:10.1145/3397271.3401151

Xun Yang, Tat-Seng Chua + Show 4 more

Open Access

https://doi.org/10.1145/3397271.3401151

Copy DOI

Abstract

The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos. Specifically, given a complex user query, we first recursively compose a latent semantic tree to structurally describe the text query. We then design a tree-augmented query encoder to derive structure-aware query representation and a temporal attentive video encoder to model the temporal characteristics of videos. Finally, both the query and videos are mapped into a joint embedding space for matching and ranking. In this approach, we have a better understanding and modeling of the complex queries, thereby achieving a better video retrieval performance. Extensive experiments on large scale video retrieval benchmark datasets demonstrate the effectiveness of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jul 25, 2020
Citations: 104	License type: cc-by-nc-nd

Similar Papers

Accurate Sampling-Based Cardinality Estimation for Complex Graph Queries
Pan Hu ... Boris Motik
ACM Transactions on Database Systems | VOL. 49
Pan Hu, et. al.Pan Hu ... Boris Motik
17 Sep 2024
ACM Transactions on Database Systems | VOL. 49

PERFORMANCE STUDY OF THE DTU MODEL FOR RELATIONAL DATABASES ON THE AZURE PLATFORM
Serhii Minukhin
Innovative Technologies and Scientific Solutions for Industries | VOL. -
Serhii MinukhinSerhii Minukhin
26 Apr 2022
Innovative Technologies and Scientific Solutions for Industries | VOL. -

A separate modelling approach for short-term bus passenger flow prediction based on behavioural patterns: A hybrid decision tree method
Peng Li ... Xiangjing Pei
Physica A: Statistical Mechanics and its Applications | VOL. 616
Peng Li, et. al.Peng Li ... Xiangjing Pei
16 Feb 2023
Physica A: Statistical Mechanics and its Applications | VOL. 616

Query2Particles: Knowledge Graph Reasoning with Particle Embeddings
Jiaxin Bai ... Yangqiu Song
-
Jiaxin Bai, et. al.Jiaxin Bai ... Yangqiu Song
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

Abstract

Talk to us

Similar Papers