Scalable Multi-grained Cross-modal Similarity Query with Interpretability

Mingdong Zhu,Lixin Xu,Derong Shen,Xianfang Wang

doi:10.1007/s41019-021-00162-4

Mingdong Zhu, Lixin Xu + Show 2 more

Open Access

https://doi.org/10.1007/s41019-021-00162-4

Copy DOI

Abstract

Cross-modal similarity query has become a highlighted research topic for managing multimodal datasets such as images and texts. Existing researches generally focus on query accuracy by designing complex deep neural network models and hardly consider query efficiency and interpretability simultaneously, which are vital properties of cross-modal semantic query processing system on large-scale datasets. In this work, we investigate multi-grained common semantic embedding representations of images and texts and integrate interpretable query index into the deep neural network by developing a novel Multi-grained Cross-modal Query with Interpretability (MCQI) framework. The main contributions are as follows: (1) By integrating coarse-grained and fine-grained semantic learning models, a multi-grained cross-modal query processing architecture is proposed to ensure the adaptability and generality of query processing. (2) In order to capture the latent semantic relation between images and texts, the framework combines LSTM and attention mode, which enhances query accuracy for the cross-modal query and constructs the foundation for interpretable query processing. (3) Index structure and corresponding nearest neighbor query algorithm are proposed to boost the efficiency of interpretable queries. (4) A distributed query algorithm is proposed to improve the scalability of our framework. Comparing with state-of-the-art methods on widely used cross-modal datasets, the experimental results show the effectiveness of our MCQI approach.

Highlights

With rapid development of computer science and technology, multimedia data including images and texts have been emerging on the Internet, which have become the main form of humans knowing the world
Cross-modal similarity query has been an essential technique with wide applications, such as search engine and multimedia data
C σ similarity function the kth matched pair of images and texts dimension of local common embedding space dimension of global common embedding space the set of patch relation tuples between images and texts the ith data instance in the dataset the ith common fine-grained semantic feature the ith common coarse-grained semantic feature weight factor to balance fine-grained and coarse-grained features number of computing nodes probability of weight factor can be omitted second stage is the index construction stage, in which M-tree index and inverted index are integrated to process efficient and interpretable queries. We introduce it in the aspects of embedding representations of multimodal data and interpretable query processing

Summary

Introduction

With rapid development of computer science and technology, multimedia data including images and texts have been emerging on the Internet, which have become the main form of humans knowing the world. Numerous parameters of deep neural networks make query process and results difficult to be explained That is, those models have weak interpretability, which is an important property for general and reliable cross-modal query system. Our core insight is that we can leverage deep neural network model to capture multi-grained cross-modal common semantics and build an efficient hybrid index with interpretability and scalability. In order to ensure the adaptability and generality of our framework, during training common feature vectors for different types we first capture coarse-grained and fine-grained semantic information by designing different networks and combine them. In order to capture the latent semantic relation between images and texts, the framework combines LSTM and attention mode, which enhances query accuracy for the cross-modal query and constructs the foundation for interpretable query processing.

Cross‐modal Retrieval

Latent Semantic Alignment

Cross‐modal Hashing

Distributed Similarity Query

Proposed Model

Fine‐grained Embedding Learning with Local Semantics

Embedding Representations of Multimodal Data

Coarse‐grained Embedding Learning with Global Semantics

Multi‐grained Objective Function

Optimization

Interpretable Query Processing

Index Construction

Interpretable kNN Query

Distributed Algorithm

Selection of Pivot Points

Query‐Sensitive Load Balancing

Computation of pn

Distributed kNN Query Algorithm

Experiment Setup

Verification of Observation 1

Performance of Query Accuracy

Performance of Query Time

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science and Engineering	Publication Date: May 31, 2021
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Scalable Multi-grained Cross-modal Similarity Query with Interpretability

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering

Lead the way for us

Similar Papers

Multi-grained Cross-modal Similarity Query with Interpretability
Lixin Xu ... Derong Shen
-
Lixin Xu, et. al.Lixin Xu ... Derong Shen
01 Jan 2020
01 Jan 2020

Stability Analysis of Electronic Circuit based on Complex Neural Network Theory
Yandong Yu ... Yuge Yao
-
Yandong Yu, et. al.Yandong Yu ... Yuge Yao
01 Jun 2020
01 Jun 2020

A Reinforcement Learning Approach for Whole Building Energy Model Assisted HVAC Supervisory Control

-

02 Oct 2019
02 Oct 2019

Multidimensional Index Structures in Relational Databases
Christian Böhm ... Urs Michel
Journal of Intelligent Information Systems | VOL. 15
Christian Böhm, et. al.Christian Böhm ... Urs Michel
01 Jan 1999
Journal of Intelligent Information Systems | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable Multi-grained Cross-modal Similarity Query with Interpretability

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering