Abstract

The benefits of the effective creation of Information Extraction (IE) in the last ten years, driven by the DARPA TIPSTER programme and the associated MUC evaluations, have been enormous, but it must now be time to ask what research issues face the systems we have built and what we should do next. We suggest that there are two classes of important research issues: those requiring detailed comparative evaluation of alternative approaches to IE subtasks and those to do with flexible adaptation of IE systems to new users and domains.Both these classes of issues, we argue, can be profitably addressed within an architecture for language engineering called GATE, the General Architecture for Text Engineering. We describe GATE, which owes a great deal to the TIPSTER architecture, and also the LaSIE IE system, which is set within GATE and with which we have competed in MUC, and bring out the distinctive features that have led to its good performance in certain areas.Within GATE, we can now reconfigure various Language Engineering modules so as to assemble alternative IE systems and then to compare their performance with LaSIE. In this way the environment provided by GATE will allow us to make significant strides in assessing alternative LE technologies and in rapidly adapting LE prototype systems for new users and domains.KeywordsNatural Language ProcessingInformation ExtractionWall Street JournalLanguage EngineeringDefense Advance Research Project AgencyThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.