Abstract
Mass multimedia data with geographical information (geo-multimedia) are collected and stored on the Internet due to the wide application of location-based services (LBS). How to find the high-level semantic relationship between geo-multimedia data and construct efficient index is crucial for large-scale geo-multimedia retrieval. To combat this challenge, the paper proposes a deep cross-modal hashing framework for geo-multimedia retrieval, termed as Triplet-based Deep Cross-Modal Retrieval (TDCMR), which utilizes deep neural network and an enhanced triplet constraint to capture high-level semantics. Besides, a novel hybrid index, called TH-Quadtree, is developed by combining cross-modal binary hash codes and quadtree to support high-performance search. Extensive experiments are conducted on three common used benchmarks, and the results show the superior performance of the proposed method.
Highlights
With the rapid development of mobile internet, social networks, and Location-Based Service (LBS), large numbers of multimedia data [1] with geographical information (a.k.a geo-multimedia) [2], such as text, image [3,4], and video [5,6,7,8], are collected and stored on the internet
We propose a triplet-based deep cross-modal hashing framework, named Tripletbased Deep Cross-Modal Retrieval (TDCMR), which aims to extract deep sample features to alleviate the semantic gap through a triplet deep neural network unified feature learning and hash learning process
To solve the problem of low representation ability and slow query speed in geomultimedia data representation and query, this paper aims to narrow the cognitive gap between human and computer in multimedia data semantic understanding through a deep neural network, construct the deep cross-modal hash (Triplet-based Deep Cross-Modal Retrieval, TDCMR) network model based on triples, and encode geo-multimedia data semantically by a trained network model
Summary
With the rapid development of mobile internet, social networks, and Location-Based Service (LBS), large numbers of multimedia data [1] with geographical information (a.k.a geo-multimedia) [2], such as text, image [3,4], and video [5,6,7,8], are collected and stored on the internet. Nearest neighbor spatial keyword query (NNSKQ) is a very important retrieval technique in LBS applications, which only focuses on location information and keyword information to find spatial objects. The traditional multi-modal retrieval techniques ignore the geographic location information To solve this dilemma, many researchers have tried to integrate multi-modal information into the query and proposed an effective nearest neighbor query method for geo-multimedia data [15]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.