Abstract

Cross-form feature combination is an important multifeature fusion technique where the purpose is to implicitly discover the relationship between samples from different modalities, i.e., to retrieve another image encoded by similar semantics through one example image. In the past decade, cross-modal image retrieval has becoming a hotspot investigated by many academicians. Moreover, it is now a significant tool for the future performance enhancement of image retrieval. A long-short term memory (LSTM)-based feature fusion model is proposed. First, aiming at the competitiveness of nonmixed deep architecture for image retrieval, the mechanism of LSTM is introduced in detail. Among them, ground-truth-based methods are used to improve cross-modality. We notice that LSTM can mimic human visual understanding of image semantics well. To improve the accuracy of oblique-form image retrieval, systems based on binary representation are proposed to improve cross-modal similarity and effectiveness of message recovery. Second, we use a quality model to measure the commonly used image low-/high-level visual features, where the disqualified features are abandoned accordingly. This in turn achieves an optimal set of highly descriptive features for image retrieval. Furthermore, we use LSTM and the refined visual features to build a biological model for image retrieval, wherein the multimodel features can be optimally incorporated at the temporal level. Extensive experimental validations on multiple well-known image sets have shown the superiority of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call