Abstract

With the rapidly growing number of images over the Internet, efficient scalable semantic image retrieval becomes increasingly important. This paper presents a novel approach for semantic image retrieval by combining Convolutional Neural Network (CNN) and Markov Random Field (MRF). As a key step, image concept detection, that is, automatically recognizing multiple semantic concepts in an unlabeled image, plays an important role in semantic image retrieval. Unlike previous work that uses single-concept classifiers one by one, we detect semantic multiconcept by using a multiconcept scene classifier. In other words, our approach takes multiple concepts as a holistic scene for multiconcept scene learning. Specifically, we first train a CNN as a concept classifier, which further includes two types of classifiers: a single-concept fully connected classifier that is best suited to single-concept detection and a multiconcept scene fully connected classifier that is good for holistic scene detection. Then we propose an MRF-based late fusion approach that is able to effectively learn the semantic correlation between the single-concept classifier and multiconcept scene classifier. Finally, the semantic correlation among the subconcepts of images is cought to further improve detection precision. In order to investigate the feasibility and effectiveness of our proposed approach, we conduct comprehensive experiments on two publicly available image databases. The results show that our proposed approach outperforms several state-of-the-art approaches.

Highlights

  • With the rapid development of information technique, a large number of multimedia objects such as images are available on the Web

  • Using our proposed Markov Random Field (MRF)-based fusion method, we model the semantic correlation between single-concept classifier and multiconcept scene classifier and estimate the relevance score for an image multiconcept scene

  • Average Precision (AP) can be computed as AP = ∑i ε(i)p(i)/r, where r is the total number of relevant images in the test set U, i is the rank in the retrieved image list R, ε(i) is an indicator function that equals 1 if the ith image is relevant to Q and equals 0 otherwise, and p(i) is the precision at cut-off i in R, which is defined as a ratio between r and the number of retrieved images

Read more

Summary

Introduction

With the rapid development of information technique, a large number of multimedia objects such as images are available on the Web. Image concept detection is a vital step To address this issue, many approaches have been proposed, such as Markov random walk [1], group sparsity [2], ensemble learning [3], and multiview semantic learning [4]. Many approaches have been proposed, such as Markov random walk [1], group sparsity [2], ensemble learning [3], and multiview semantic learning [4] Effective, these approaches work in the case of single-concept-based image retrieval. These approaches work in the case of single-concept-based image retrieval This means that each semantic query is supposed to contain only one semantic concept, restricting its practice usability

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call