Keyword-Based Deep Web Database Selection

Ju Fan,Li-Zhu Zhou

doi:10.3724/sp.j.1016.2011.01797

Abstract

This paper proposes a keyword-based Deep Web search method: Given keyword queries provided by users,the proposed method on-the-fly selects the databases capturing the queryintent and providing high-quality data.The method,which is much more efficient than Deep Webcrawling,can support keyword search over multiple-domain Deep Web databases,and thus can besmoothly integrated with the existing search engine architecture.In this paper,we focus on key-word-based Deep Web database selection,and studythe research challenges that naturally arisein the proposed method.(1) We introduce an effective model to measure the relevance of database-domain attributes with respect to keyword queries,and propose a random-walk algorithm to compute the relevance fromdatabase query logs.(2) We develop a novel database sampling method for measuring the relevance of databases with respect to queries,in order to select relevant data-bases in the selected domains.We have implemented our methods on real data sets fromthe Chinese Deep Web.The experi mental results show that our methods achieve high effectiveness.

Full Text