Abstract

Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta-information; however, the schema is not formally defined in HTML. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we first propose a schema model for representing complex search interfaces, and then present a layout-expression based approach to automatically extract the logical attributes from search interfaces. We also rephrase the identification of different types of semantic information as a classification problem, and design several Bayesian classifiers to help derive semantic information from extracted attributes. A system, WISE-iExtractor, has been implemented to automatically construct the schema from any Web search interfaces. Our experimental results on real search interfaces indicate that this system is highly effective.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call