Abstract
Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.
Highlights
Named entity recognition (NER) is a sub-task of information extraction (IE) to identify and classify textual elements into a pre-defined set of categories called named entities (NEs) such as the name of a person, organization, or location, expressions of time, quantities, monetary values, percentages, etc
We put forth an approach to generate gazetteers dynamically for three named entities—person, location, and organization—and propose gazetteer-based features for Telugu NER
We performed morphological pre-processing and used language-dependent features to enhance the performance of the NER models
Summary
Named entity recognition (NER) is a sub-task of information extraction (IE) to identify and classify textual elements (words or sequences of words) into a pre-defined set of categories called named entities (NEs) such as the name of a person, organization, or location, expressions of time, quantities, monetary values, percentages, etc. NER plays an essential role in extracting knowledge from the digital information stored in a structured or unstructured form. It acts as a pre-processing tool for many applications, and some of these applications are listed below: . The research study by Babych and Hartley [4] showed that including a pre-processing step by tagging text with
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.