Abstract

The proliferation of online information has led to an increased use of wrappers for extracting data from Web sources and transforming it to a structured format. The resulting data can then be used to build new enterprise applications. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important problem, because Web sources often change in ways that prevent the wrappers from operating correctly. In this chapter, we describe machine learning techniques for verifying that a wrapper is working correctly and repairing it if not. Our approach is to learn structural descriptions of data and use these descriptions to verify that the wrapper is correctly extracting data. The repair algorithm automatically recovers from Web source format changes by identifying data so that a new wrapper may be generated for this source.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call