Abstract

Extracting information from the Web is a complex task with different components which can either be generic or specific to the task, going from downloading a given page, following links, querying a Web-based applications via an HTML form and the HTTP protocol, querying a Web service via the SOAP protocol, etc. Therefore building Web services which proceed to executing an information tasks can not be simply hard coded (i.e. written and compiled once and for all in a given programming language). In order to be able to build flexible information extraction Web Services we need to be able to compose different sub tasks together. We propose a, XML-based language to describe information extraction Web services as the compositions of existing Web services and specific functions. The usefulness the proposed framework is demonstrated by three real world applications. (1) Search engines: we show how to describe a task which queries Google's Web service, retrieves more information on the results by querying their respective HTTP servers, and filters them according to this information. (2) E-commerce sites : an information extraction Web service giving access to an existing HTML-based e-commerce online application such as Amazon is built. (3) Patent extraction: a last example shows how to describe an information extraction Web service which allows to query a Web-based application, extract the set of result links, follow them, and extract the needed information on the result pages. In all three applications the generated description can be easily modified and completed to further respond the user's needs and create value-added Web services.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.