Information Retrieval Tasks Research Articles

As an increasingly popular task in multimedia information retrieval, video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. Most previous methods depend heavily on numerous manual annotations ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , moment boundaries), which are extremely expensive to acquire in practice. In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain (“source domain”), but the domain of interest (“target domain”) only contains unannotated datasets. As far as we know, we present the first study on cross-domain VMR. To address this new task, we propose a novel Multi-Modal Cross-Domain Alignment (MMCDA) network to transfer the annotation knowledge from the source domain to the target domain. However, due to the domain discrepancy between the source and target domains and the semantic gap between videos and queries, directly applying trained models to the target domain generally leads to a performance drop. To solve this problem, we develop three novel modules: (i) a domain alignment module is designed to align the feature distributions between different domains of each modality; (ii) a cross-modal alignment module aims to map both video and query features into a joint embedding space and to align the feature distributions between different modalities in the target domain; (iii) a specific alignment module tries to obtain the fine-grained similarity between a specific frame and the given query for optimal localization. By jointly training these three modules, our MMCDA can learn domain-invariant and semantic-aligned cross-modal representations. Extensive experiments on three challenging benchmarks (ActivityNet Captions, Charades-STA and TACoS) illustrate that our cross-domain method MMCDA outperforms all state-of-the-art single-domain methods. Impressively, MMCDA raises the performance by more than 7% in representative cases, which demonstrates its effectiveness.

Read full abstract

Recent years have seen enormous gains in core information retrieval tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional information retrieval test collections, such as those developed by TREC, the MS MARCO datasets employ substantially more queries—thousands vs. dozens – with substantially fewer known relevant items per query—often just one. For example, 94% of the nearly seven thousand queries in the MS MARCO passage ranking development set have only a single known relevant passage, and no query has more than four. Given the sparsity of these relevance labels, the MS MARCO leaderboards track improvements with mean reciprocal rank (MRR). In essence, the known relevant item is treated as the “right answer” or “best answer”, with rankers scored on their ability to place this item as high in the ranking as possible. In working with these sparse labels, we have observed that the top items returned by a ranker often appear superior to judged relevant items. Others have reported the same observation. To test this observation, we employed crowdsourced workers to make preference judgments between the top item returned by a modern neural ranking stack and a judged relevant item for the nearly seven thousand queries in the passage ranking development set. The results support our observation. If we imagine a hypothetical perfect ranker under MRR, with a score of 1 on all queries, our preference judgments indicate that a searcher would prefer the top result from a modern neural ranking stack more frequently than the top result from the hypothetical perfect ranker, making our neural ranker “better than perfect”. To understand the implications for the leaderboard, we pooled the top document from available runs near the top of the passage ranking leaderboard for over 500 queries. We employed crowdsourced workers to make preference judgments over these pools and re-evaluated the runs. Our results support our concerns that current MS MARCO datasets may no longer be able to recognize genuine improvements in rankers. In future, if rankers are measured against a single answer, this answer should be the best answer or most preferred answer, and maintained with ongoing judgments. Since only the best known answer is required, this ongoing maintenance might be performed with shallow pooling. When a previously unjudged document is surfaced as the top item in a ranking, it can be directly compared with the previous best known answer.

Read full abstract

Information Retrieval Tasks Research Articles

Related Topics

Articles published on Information Retrieval Tasks

Scholarly Communication: A Discipline that should be promoted

Comunicación académica: una disciplina que nos conviene impulsar

Extractive Arabic Text Summarization-Graph-Based Approach

DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

A comparison of two text specificity measures analyzing a heterogenous text corpus

Information Retrieval With Chessboard-Shaped Topology for Hyperspectral Target Detection

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

The Process and Algorithm Analysis of Text Mining System Based on Artificial Intelligence

A Comprehensive Survey of Music Genre Classification Using Audio Files

Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval

A neural harmonic-aware network with gated attentive fusion for singing melody extraction

Deep Learning Techniques for Pattern Recognition in EEG Audio Signal-Processing-Based Eye-Closed and Eye-Open Cases

Dealing with textual noise for robust and effective BERT re-ranking

Conversational Agents for Information Retrieval in the Education Domain

Evaluating the Impact on Clinical Task Efficiency of a Natural Language Processing Algorithm for Searching Medical Documents: Prospective Crossover Study.

Automatic signboard detection and localization in densely populated developing cities

Construction of Fuzzy Linguistic Approximate Concept Lattice in an Incomplete Fuzzy Linguistic Formal Context

Shallow pooling for sparse labels

GCKG: Novel Gated Convolutional embedding model for Knowledge Graphs

Relevance Feedback and Deep Neural Network-Based Semantic Method for Query Expansion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Retrieval Tasks Research Articles

Related Topics

Articles published on Information Retrieval Tasks

Scholarly Communication: A Discipline that should be promoted

Comunicación académica: una disciplina que nos conviene impulsar

Extractive Arabic Text Summarization-Graph-Based Approach

DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

A comparison of two text specificity measures analyzing a heterogenous text corpus

Information Retrieval With Chessboard-Shaped Topology for Hyperspectral Target Detection

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

The Process and Algorithm Analysis of Text Mining System Based on Artificial Intelligence

A Comprehensive Survey of Music Genre Classification Using Audio Files

Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval

A neural harmonic-aware network with gated attentive fusion for singing melody extraction

Deep Learning Techniques for Pattern Recognition in EEG Audio Signal-Processing-Based Eye-Closed and Eye-Open Cases

Dealing with textual noise for robust and effective BERT re-ranking

Conversational Agents for Information Retrieval in the Education Domain

Evaluating the Impact on Clinical Task Efficiency of a Natural Language Processing Algorithm for Searching Medical Documents: Prospective Crossover Study.

Automatic signboard detection and localization in densely populated developing cities

Construction of Fuzzy Linguistic Approximate Concept Lattice in an Incomplete Fuzzy Linguistic Formal Context

Shallow pooling for sparse labels

GCKG: Novel Gated Convolutional embedding model for Knowledge Graphs

Relevance Feedback and Deep Neural Network-Based Semantic Method for Query Expansion