A Graph-Based Approach for Web Database Sampling

Wei Liu

doi:10.3724/sp.j.1001.2008.00179

Abstract

Web数据库中,海量的信息隐藏在具有特定查询能力的查询接口后面,使人无法了解一个Web数据库内容的特征,比如主题的分布、更新的频率等,这就为Deep Web数据集成带来了巨大的挑战.为了解决这个问题,提出了一种基于图模型的Web数据库采样方法,可以通过查询接口从Web数据库中以增量的方式获取近似随机的样本,即每次查询获取一定数量的样本记录,并且利用已经保存在本地的样本记录生成下一次的查询.该方法的一个重要特点是不受查询接口中属性表现形式的局限,因此是一种一般的Web数据库采样方法.在本地的模拟实验和真实Web数据库上的大量实验表明,该方法可以在较小代价下获得高质量的样本.;A flood of information is hidden behind the Web-based query interfaces with specific query capabilities, which makes it difficult to capture the characteristics of the Web database, such as the topic and the frequency of updates. This poses a great challenge for Deep Web data integration. To address this problem, a graph-based approach WDB-Sampler for Web database sampling is proposed in this paper, which can incrementally obtain sample records from a Web database through its query interface. That is, a number of samples are obtained for the current query, and one of them is transformed into the next query. The important characteristic of this approach is it can adapt to different kinds of attributes on the query interfaces. The extensive experiments on the local simulation Web databases and the real Web databases prove that the approach can achieve high-quality samples from a Web database at a lower cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Graph-Based Approach for Web Database Sampling

Abstract

Talk to us

Similar Papers

More From: Journal of Software

Lead the way for us

Journal: Journal of Software	Publication Date: Jul 10, 2008
Citations: 9

Similar Papers

An Increment-Based Random Walk Approach to Sampling Hidden Databases
Na Zhao ... Qingzhong Li
-
Na Zhao, et. al.Na Zhao ... Qingzhong Li
01 Jan 2008
01 Jan 2008

WDB's Query Interface Extraction Method Based on Watir & Ruglar Expression
Lin Zhao ... Pei Guang Lin
Key Engineering Materials | VOL. 467-469
Lin Zhao, et. al.Lin Zhao ... Pei Guang Lin
01 Feb 2011
Key Engineering Materials | VOL. 467-469

Domain-based data integration for Web databases
Weifeng Su
-
Weifeng SuWeifeng Su
23 Dec 2014
23 Dec 2014

An ontology-based integration of Web query interfaces for house search
Zhongmin Yan ... Yanhui Ding
-
Zhongmin Yan, et. al. Zhongmin Yan ... Yanhui Ding
01 Jun 2008
01 Jun 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Graph-Based Approach for Web Database Sampling

Abstract

Talk to us

Similar Papers

More From: Journal of Software