Abstract

Deep Web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and information retrieval and integrates. In webscale Deep Web data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, It must be inefficient to integrate all available Deep Web sources. This paper proposes a data source selection approach based on the quality of Deep Web source. It is used for automatic finding the highest quality set of Deep Web sources related to a particular domain, which is a premise for effective Deep Web data integration. The quality of data sources are assessed by evaluating quality dimensions represent the characteristics of Deep Web source. Experiments running on real Deep Web sources collected from the internet show that our provides an effective and scalable solution for selecting data sources for Deep Web data integration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.