Abstract

Due to the rapid development of mobile Internet techniques, cloud computation and popularity of online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval called geo-multimedia cross-modal retrieval which aims to search out a set of geo-multimedia objects based on geographical distance proximity and semantic similarity between different modalities. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags and do not focus on this type of query. In order to address this problem efficiently, we present the definition of $k$NN geo-multimedia cross-modal query at the first time and introduce relevant conceptions such as cross-modal semantic representation space. To bridge the semantic gap between different modalities, we propose a method named cross-modal semantic matching which contains two important component, i.e., CorrProj and LogsTran, which aims to construct a common semantic representation space for cross-modal semantic similarity measurement. Besides, we designed a framework based on deep learning techniques to implement common semantic representation space construction. In addition, a novel hybrid indexing structure named GMR-Tree combining geo-multimedia data and R-Tree is presented and a efficient $k$NN search algorithm called $k$GMCMS is designed. Comprehensive experimental evaluation on real and synthetic dataset clearly demonstrates that our solution outperforms the-state-of-the-art methods.

Highlights

  • Due to the rapid popularity of mobile Internet techniques, online social networking and location-based services, massive amount of multimedia data is generated and uploaded to the Internet

  • To solve the problem of geo-multimedia cross-modal retrieval, we introduce a novel framework that consists of multi-modal feature extraction, cross-modal semantic space mapping, geo-multimedia spatial index and cross-modal semantic similarity measurement

  • RELATED WORK we introduce an overview of previous works of multi-modal and cross-modal retrieval, deep learning based multimedia retrieval and spatial textual search, which are related to this work

Read more

Summary

INTRODUCTION

Due to the rapid popularity of mobile Internet techniques, online social networking and location-based services, massive amount of multimedia data is generated and uploaded to the Internet. Previous studies of traditional multi-modal and cross-modal retrieval do not consider the geo-multimedia data These existing methods cannot improve the retrieval performance by using spatial information. A novel framework of geo-multimedia cross-modal retrieval is presented, which is based on deep learning and spatial indexing techniques. To improve the search performance, we present a novel hybrid indexing structure named GMR-Tree which is a combination of signature technique, multi-modal semantic representations and R-Tree Based on it we develop a novel search algorithm named kGMCMS to boost the retrieval.

RELATED WORK
CROSS-MODAL SEMANTIC REPRESENTATION SPACE
THE FRAMEWORK
CROSS-MODAL SEMANTIC MATCHING
CROSS-MODAL SEMANTIC REPRESENTATION SPACE LEARNING
K NN GEO-MULTIMEDIA CROSS-MODAL SEARCH ALGORITHM
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call