Abstract

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Highlights

  • The information available on the internet is increasing drastically

  • We address the problem of effective word representation for Named Entity Recognition (NER) in Indian languages by capturing the syntactic, semantic and morphological information

  • The model achieved an average accuracy of 97.45% on IECSIL (Information Extractor for Conversational Systems in Indian Languages) test data

Read more

Summary

Introduction

The information available on the internet is increasing drastically. Annual growth in the number of Internet users is increasing. Lots of texts and images are added to the internet every second. This information is stored on the web in an unstructured manner. Finding the relevant information from this unstructured data is very time-consuming. The importance of information extraction (IE), a sub-branch of Artificial Intelligence is worth mentioning at this point. Information Extraction transforms the unstructured text into a structured form that is convenient for machine level processing. IE plays important roles in information retrieval, question answering, summarization and so forth [1]. NER is one of the subdomains of IE, which originated in the sixth message understanding conference (MUC-6) [2]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call