A generic framework for ontology-based information retrieval and image retrieval in web data

V Vijayarajan,Priyam Tejaswin,M Dinakaran,Mayank Lohani

doi:10.1186/s13673-016-0074-1

V Vijayarajan, Priyam Tejaswin + Show 2 more

Open Access

https://doi.org/10.1186/s13673-016-0074-1

Copy DOI

Abstract

AbstractIn the internet era, search engines play a vital role in information retrieval from web pages. Search engines arrange the retrieved results using various ranking algorithms. Additionally, retrieval is based on statistical searching techniques or content-based information extraction methods. It is still difficult for the user to understand the abstract details of every web page unless the user opens it separately to view the web content. This key point provided the motivation to propose and display an ontology-based object-attribute-value (O-A-V) information extraction system as a web model that acts as a user dictionary to refine the search keywords in the query for subsequent attempts. This first model is evaluated using various natural language processing (NLP) queries given as English sentences. Additionally, image search engines, such as Google Images, use content-based image information extraction and retrieval of web pages against the user query. To minimize the semantic gap between the image retrieval results and the expected user results, the domain ontology is built using image descriptions. The second proposed model initially examines natural language user queries using an NLP parser algorithm that will identify the subject-predicate-object (S-P-O) for the query. S-P-O extraction is an extended idea from the ontology-based O-A-V web model. Using this S-P-O extraction and considering the complex nature of writing SPARQL protocol and RDF query language (SPARQL) from the user point of view, the SPARQL auto query generation module is proposed, and it will auto generate the SPARQL query. Then, the query is deployed on the ontology, and images are retrieved based on the auto-generated SPARQL query. With the proposed methodology above, this paper seeks answers to following two questions. First, how to combine the use of domain ontology and semantics to improve information retrieval and user experience? Second, does this new unified framework improve the standard information retrieval systems? To answer these questions, a document retrieval system and an image retrieval system were built to test our proposed framework. The web document retrieval was tested against three key-words/bag-of-words models and a semantic ontology model. Image retrieval was tested on IAPR TC-12 benchmark dataset. The precision, recall and accuracy results were then compared against standard information retrieval systems using TREC_EVAL. The results indicated improvements over the standard systems. A controlled experiment was performed by test subjects querying the retrieval system in the absence and presence of our proposed framework. The queries were measured using two metrics, time and click-count. Comparisons were made on the retrieval performed with and without our proposed framework. The results were encouraging.

Highlights

The web is vast, but it is not intelligent enough to recognize the queries made by users and relate them to real or abstract entities in the world
The Semantic Web is the level of web, which treats it as a knowledge graph rather than a collection of web resources interconnected with hyperlinks and URLs
Providing the top web results does not complete the task if the user still has to browse through them; providing semantically extracted O-A-V triplets with each web link will provide the user with valuable insight and save time

Summary

Background

The web is vast, but it is not intelligent enough to recognize the queries made by users and relate them to real or abstract entities in the world. In the Semantic Web, the ontologies act as the building blocks for the infrastructure of the semantic web They transform the existing web data into the web of knowledge, share the knowledge among various web applications, and enable intelligent web services. The Resource Description Framework (RDF) [11] is an official W3C Recommendation for Semantic Web data models. There are some challenges to be considered while constructing a knowledge graph discussed in [27] It works at the outer level, drawing semantic relationships among various resources, and provides us with the best web results. It comes as a plugin to web browsers It decides the user domain of search by asking him to select an ontology and concepts to confine his search. There are some annotation-based image retrieval systems using ontology, but they do not use SPARQL queries. The feature based reranking algorithm for image similarity prediction using query-context bag-of-object retrieval technique is discussed in [33]

Proposed architectures

Algorithm design

Complex queries mean

Semantic Lucence TREC automatic TREC manual

Findings

Conclusions