Assessing relevance and trust of the deep web sources and results based on inter-source agreement

Raju Balakrishnan,Manishkumar Jha,Subbarao Kambhampati

doi:10.1145/2460383.2460390

Abstract

Deep web search engines face the formidable challenge of retrieving high-quality results from the vast collection of searchable databases. Deep web search is a two-step process of selecting the high-quality sources and ranking the results from the selected sources. Though there are existing methods for both the steps, they assess the relevance of the sources and the results using the query-result similarity. When applied to the deep web these methods have two deficiencies. First is that they are agnostic to the correctness (trustworthiness) of the results. Second, the query-based relevance does not consider the importance of the results and sources. These two considerations are essential for the deep web and open collections in general. Since a number of deep web sources provide answers to any query, we conjuncture that the agreements between these answers are helpful in assessing the importance and the trustworthiness of the sources and the results. For assessing source quality, we compute the agreement between the sources as the agreement of the answers returned. While computing the agreement, we also measure and compensate for the possible collusion between the sources. This adjusted agreement is modeled as a graph with sources at the vertices. On this agreement graph, a quality score of a source, that we call SourceRank , is calculated as the stationary visit probability of a random walk. For ranking results, we analyze the second-order agreement between the results. Further extending SourceRank to multidomain search, we propose a source ranking sensitive to the query domains. Multiple domain-specific rankings of a source are computed, and these ranks are combined for the final ranking. We perform extensive evaluations on online and hundreds of Google Base sources spanning across domains. The proposed result and source rankings are implemented in the deep web search engine Factal . We demonstrate that the agreement analysis tracks source corruption. Further, our relevance evaluations show that our methods improve precision significantly over Google Base and the other baseline methods. The result ranking and the domain-specific source ranking are evaluated separately.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing relevance and trust of the deep web sources and results based on inter-source agreement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web

Lead the way for us

Journal: ACM Transactions on the Web	Publication Date: May 1, 2013
Citations: 19

Similar Papers

TODWEB
Umara Noor ... Azhar Rauf
-
Umara Noor, et. al.Umara Noor ... Azhar Rauf
05 Dec 2011
05 Dec 2011

Automatic Generation of Ontology from the Deep Web
Yoo Jung An ... James Geller
-
Yoo Jung An, et. al.Yoo Jung An ... James Geller
01 Sep 2007
01 Sep 2007

Extraction of relational schema from deep web sources: a form driven approach
Yasser Saissi ... Ahmed Zellou
-
Yasser Saissi, et. al.Yasser Saissi ... Ahmed Zellou
01 Nov 2014
01 Nov 2014

Discovering the Deep Web through XML Schema Extraction
Yasser Saissi ... Ali Idri
-
Yasser Saissi, et. al.Yasser Saissi ... Ali Idri
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing relevance and trust of the deep web sources and results based on inter-source agreement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web