Cross-modal Information Retrieval Research Articles

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level. Specifically, we present a novel approach called Transformer Encoder Reasoning and Alignment Network (TERAN). TERAN enforces a fine-grained match between the underlying components of images and sentences (i.e., image regions and words, respectively) to preserve the informative richness of both modalities. TERAN obtains state-of-the-art results on the image retrieval task on both MS-COCO and Flickr30k datasets. Moreover, on MS-COCO, it also outperforms current approaches on the sentence retrieval task. Focusing on scalable cross-modal information retrieval, TERAN is designed to keep the visual and textual data pipelines well separated. Cross-attention links invalidate any chance to separately extract visual and textual features needed for the online search and the offline indexing steps in large-scale retrieval systems. In this respect, TERAN merges the information from the two domains only during the final alignment phase, immediately before the loss computation. We argue that the fine-grained alignments produced by TERAN pave the way toward the research for effective and efficient methods for large-scale cross-modal information retrieval. We compare the effectiveness of our approach against relevant state-of-the-art methods. On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks on the Recall@1 metric. The code used for the experiments is publicly available on GitHub at https://github.com/mesnico/TERAN .

Read full abstract

e14050 Background: Clinical trials suffer from insufficient patient (pt) recruitment. The availability of electronic health records (EHR) and trial eligibility criteria (EC) is promising for data driven pt-trial matching. The objective is to find qualified pt given patients' EHR and trial EC in unstructured text EC. Pseudo-Siamese network is a novel subfield within information retrieval and has shown great success in the cross-modal information retrieval problems such as semantic image-text retrieval (e.g., match images with text descriptions). The objective is to find the match between pts and clinical trials using Pseudo-Siamese network based cross-modal retrieval. Our model addresses the following challenges: (1) How to match unstructured EC text with structured EHR where EC often encode more general disease concepts and EHR represent pt conditions using more specific medical codes. (2) How to capture pts' evolving health conditions. (3) How to explicitly handle the difference for inclusion and exclusion criteria. Methods: Our matching model addresses these challenges as follows: (1) we augment the medical codes in pts’ records with their textual descriptions and hierarchical taxonomies, such that concepts can be embedded in finer and more coarse levels for better concept alignment across pt data and ECs. (2) We include an attentive dynamic memory network that extracts the best matching and more recent pt EHR to match with ECs. (3) We introduce a composite loss term to maximize the similarity between pt records and inclusion criteria while minimizes the similarity between pt records and exclusion criteria. Results: We evaluated our model on a pt-trial match dataset on the ECs collected from 590 clinical trials from ClinicalTrials.gov. We also extract 83,371 pt claims data from IQVIA database collected (2002-2018), where each pt is eligible for at least one trial. We compared our model with leading pt-trial matching models. Our model significantly outperforms the best baseline model by 24.3% relatively higher accuracy score. We also tested these models in 34 oncology trials in 25 cancers. Results will be reported. Conclusions: Pseudo-Siamese network successfully solved the cross-modal information retrieval problems. We therefore propose a new pt-trial matching model based on Pseudo-Siamese network model. Experiments on real-world datasets demonstrated that our model significantly outperforms existing works in pt-trial matching for oncology trials.

Read full abstract

Cross-modal Information Retrieval Research Articles

Related Topics

Articles published on Cross-modal Information Retrieval

VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval

RoCC:robust covert communication based on cross-modal information retrieval

Multi-Modal Machine Learning in Engineering Design: A Review and Future Directions

LANGUAGE DETECTION USING MACHINE LEARNING

Annotate and retrieve in vivo images using hybrid self-organizing map

<i>c</i>-SNE: Deep Cross-modal Retrieval based on Subjective Information using Stochastic Neighbor Embedding

Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval

Analysis of Network Information Retrieval Method Based on Metadata Ontology

Graph Convolutional Networks for Cross-Modal Information Retrieval

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval.

Clustering-driven Deep Adversarial Hashing for scalable unsupervised cross-modal retrieval

Functional heterogeneity in the left lateral posterior parietal cortex during visual and haptic crossmodal dot-surface matching.

Cross-modal learning with prior visual relation knowledge

Patient trial matching using pseudo-siamese network.

Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval

CMIR-NET : A deep learning based model for cross-modal retrieval in remote sensing

Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention

Parallel text alignment

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-modal Information Retrieval Research Articles

Related Topics

Articles published on Cross-modal Information Retrieval

VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval

RoCC:robust covert communication based on cross-modal information retrieval

Multi-Modal Machine Learning in Engineering Design: A Review and Future Directions

LANGUAGE DETECTION USING MACHINE LEARNING

Annotate and retrieve in vivo images using hybrid self-organizing map

&lt;i&gt;c&lt;/i&gt;-SNE: Deep Cross-modal Retrieval based on Subjective Information using Stochastic Neighbor Embedding

Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval

Analysis of Network Information Retrieval Method Based on Metadata Ontology

Graph Convolutional Networks for Cross-Modal Information Retrieval

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval.

Clustering-driven Deep Adversarial Hashing for scalable unsupervised cross-modal retrieval

Functional heterogeneity in the left lateral posterior parietal cortex during visual and haptic crossmodal dot-surface matching.

Cross-modal learning with prior visual relation knowledge

Patient trial matching using pseudo-siamese network.

Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval

CMIR-NET : A deep learning based model for cross-modal retrieval in remote sensing

Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention

Parallel text alignment

<i>c</i>-SNE: Deep Cross-modal Retrieval based on Subjective Information using Stochastic Neighbor Embedding