Improving Search Engines by Demoting Non-Relevant Documents

Fadi Yamout,Mireille Makary

doi:10.5121/ijnlc.2019.8401

Abstract

A good search engine aims to have more relevant documents on the top of the list. This paper describes a new technique called ???Improving search engines by demoting non-relevant documents??? (DNR) that improves the precision by detecting and demoting non-relevant documents. DNR generates a new set of queries that are composed of the terms of the original query combined in different ways. The documents retrieved from those new queries are evaluated using a heuristic algorithm to detect the non-relevant ones. These non-relevant documents are moved down the list which will consequently improve the precision. The new technique is tested on WT2g test collection. The testing of the new technique is done using variant retrieval models, which are the vector model based on the TFIDF weighing measure, the probabilistic models based on the BM25, and DFR-BM25 weighing measures. The recall and precision ratios are used to compare the performance of the new technique against the performance of the original query.

Highlights

Search engines extract user-specified information from documents and files, ranging from books to online blogs, journals, and academic articles [1]
The new technique is tested on WT2g1 test collection using the vector model [9,10,11,12] based on the TFIDF weighing measure[13,14], the probabilistic models [15] based on the Best Match 25 (BM25), and DFR-BM25 weighing measures[16,17,18]
When DNR is tested in the probabilistic model based on BM25 weighting measure [18] it classified 3631 non-relevant documents as non-relevant

Summary

INTRODUCTION

Search engines extract user-specified information from documents and files, ranging from books to online blogs, journals, and academic articles [1]. Search engines cannot be 100% accurate because the document relevance is subjective and depends on the user's judgment, which depends on many factors such as his knowledge about the topic, the reason for searching, and his satisfaction with the returned result [3].There are many challenges involved in making a search engine successful [2,4]. These challenges include acquiring lots of relevant documents from many sources, extracting useful representations of the documents to facilitate search, ranking documents in response to a user request, and presenting the search results effectively by posting the most relevant document on the top of the list [5,6,7]. The recall and precision ratios are used to compare the performance of the new technique against the performance of the original query

Vector Model

Probabilistic

WEIGHTING TERMS

10 DocFreqi

DFR-BM25

TEST COLLECTION

ASSESSMENT

THE NEW TECHNIQUE

EXPERIMENTS AND RESULTS

Using the vector model based on TFIDF

Using the probabilistic model based on BM25

97 Relevant Rejected : 526

Using the probabilistic model based on DFR_BM251

93 Relevant Rejected : 533

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Search Engines by Demoting Non-Relevant Documents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing

Lead the way for us

Journal: International Journal on Natural Language Computing	Publication Date: Aug 31, 2019
License type: cc-by

Similar Papers

Promoting Agriculture Knowledge via Public Web Search Engines: An Experience by an Iranian Librarian in Response to Agricultural Queries
Sedigheh Mohamadesmaeil ... Saeed Ghaffari
COLLNET Journal of Scientometrics and Information Management | VOL. 6
Sedigheh Mohamadesmaeil, et. al.Sedigheh Mohamadesmaeil ... Saeed Ghaffari
01 Dec 2012
COLLNET Journal of Scientometrics and Information Management | VOL. 6

Co-occurrence based predictors for estimating query difficulty
Hazra Imran ... Aditi Sharan
-
Hazra Imran, et. al.Hazra Imran ... Aditi Sharan
01 Dec 2010
01 Dec 2010

Learning to find answers to questions on the Web
Eugene Agichtein ... Steve Lawrence
ACM Transactions on Internet Technology | VOL. 4
Eugene Agichtein, et. al.Eugene Agichtein ... Steve Lawrence
01 May 2004
ACM Transactions on Internet Technology | VOL. 4

Learning search engine specific query transformations for question answering
Eugene Agichtein ... Luis Gravano
-
Eugene Agichtein, et. al.Eugene Agichtein ... Luis Gravano
01 Apr 2001
01 Apr 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Search Engines by Demoting Non-Relevant Documents

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing