Investigate the use of Anchor-Text and of Query-Document Similarity Scores to Predict the Performance of Search Engine

Abdulmohsen Almalawi,Adel Fahad,Rayed Alghamdi

doi:10.14569/ijacsa.2017.081140

Abstract

Query difficulty prediction aims to estimate, in advance, whether the answers returned by search engines in response to a query are likely to be useful. This paper proposes new predictors based upon the similarity between the query and answer documents, as calculated by the three different models. It examined the use of anchor text-based document surrogates, and how their similarity to queries can be used to estimate query difficulty. It evaluated the performance of the predictors based on 1) the correlation between the average precision (AP), 2) the precision at 10 (P@10) of the full text retrieved results, 3) a similarity score of anchor text, and 4) a similarity score of full-text, using the WT10g data collection of web data. Experimental evaluation of our research shows that five of our proposed predictors demonstrate reliable and consistent performance across a variety of different retrieval models.

Highlights

The need to find useful information is an old problem
We explore using the mean of top N (1, 10, 50, 100, 500, and 1000) ranked documents as the similarity score for each query in all our approaches
The results are given with respect to three retrieval models (Okapi, Cosine and Dirichlet) and the use of two topics: Text REtrieval Conference (TREC) 9 as training set and TREC 10 as evaluation set

Summary

Introduction

The need to find useful information is an old problem. With more and more electronic data becoming available, finding information that is relevant becomes more challenging. Due to the impossibility of going through the enormous number of documents to see whether they satisfy an information need, many information retrieval techniques have been introduced. Ranking documents according to their similarity to the information needed is one of the techniques that attempts to overcome the challenge of searching in large information repositories. A number of information retrieval models have been introduced These models can be classified into set-theoretic, algebraic and probabilistic models. Ranking relevant documents according to their similarity to a user‟s information need is not the only problem that is facing the information retrieval systems. Performing queries are a significant challenge for information retrieval systems. This issue has been investigated by Information Retrieval (IR) researchers. Studding query difficulty prediction is an interesting problem in its own right

Objectives

Methods

Results

Conclusion