Abstract

Being aware of the importance of Information Extraction (IE) in supporting innovation in many areas of library services, the authors began to construct a Chinese information extraction system to effectively process huge Chinese information resources. The authors bring forth a Chinese IE solution which makes full use of the GATE (General Architecture for Text Engineering) system from the University of Sheffield, trying to develop a Chinese IE plug-in to process Chinese information resources based on the GATE framework. The article analyses the framework of the GATE system, describes the Chinese IE solution based on the GATE system and focuses on three key difficulties in the process of implementing a Chinese information extraction system. These are: 1. Chinese tokenizing problem; 2. professional gazetteers; 3. Chinese named entity recognition. The authors have successfully implemented this system and carried out an experiment in which the Chinese IE system successfully extracted thousands of pieces of science and technology news. The authors believe this system is a significant trial and lays a good foundation for future research work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call