Abstract

Information Harvest Warehouse (IHWA) is a web-based information search system. It is designed using the Component Based Software Engineering (CBSE) paradigm, where applications are to be developed by integrating server-side EJB and client-side JCC components. The search system is under a major reconstruction in order to be more general and robust, and to be ready for evolving electronic commerce demands. In this paper, we describe the development of the meta-information gathering service of IHWA (meta gatherer), which collects and extracts information from semi-structured or unstructured data sources. Focus is on the development of the information extraction service of the gatherer from semi-structured (DTD-unknown XML data) Internet information sources. The information extraction module implemented provides clean Java programming interfaces, so that it can be easily integrated with other applications. Its implementation is an efficient one as well, since it analyzes a source XML file in one path, where most other systems use the two paths approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.