Abstract

The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats. Recently evolving customer friendly web applications need special data extraction mechanisms to draw out the required data from these deep web, according to the end user query and populate to the output page dynamically at the fastest rate. In existing research areas web data extraction methods are based on the supervised learning (wrapper induction) methods. In the past few years researchers depicted on the automatic web data extraction methods based on similarity measures. Among automatic data extraction methods our existing Combining Tag and Value similarity method, lags to identify an attribute in the query result table. A novel approach for data extracting and label assignment called Annotation for Query Result Records based on domain specific ontology. First, an ontology domain is to be constructed using information from query interface and query result pages obtained from the web. Next, using this domain ontology, a meaning label is assigned automatically to each column of the extracted query result records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call