Abstract
Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from multilingual web pages and that it is domain independent.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.