Abstract

The World Wide Web is witnessing an increase in the amount of structured content--vast collection of structured data are on the rise due to the deep web. Such Internet-scale deep web data integration tasks are becoming increasingly more common. In Internet-scale deep web data integration tasks, a primary challenge is to determine in which web database to be include d in the integration system. This paper presents a utility maximization model for resources selection of deep web data integration. This new model shows an efficient and effective way to estimate the approximate utility of the web database bringing to a given status of an integration system by integrating it. The utility of the web databases is synthesized by positive and negative utility. With the estimated utility information, web database selection can be made by explicitly optimizing the goal of high-utility(include as m uch and important data as possible in the selected databases, and the query cost of which as low as possible) in an iterative manner, where web databases are integrated incrementally. We experimentally demonstrate that our approach is efficient and finding high-utility data integration solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call