ABSTRACT Question Answering (QA) systems attempt to retrieve precise answers to questions posed in natural language by the users. It is a sophisticated form of Information Retrieval (IR) that uses a predefined collection of raw data in natural language. Malayalam is an official language in India, that is not only morphologically rich and agglutinative in nature but is also resource constrained. These aspects of the language make QA in Malayalam very challenging. This paper proposes a deep learning based QA system for Malayalam using techniques such as Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), and Memory Network models. Facebook bAbI dataset consisting of 20 tasks with the questions having multiple supporting facts, inductive and deductive reasoning, coreference, etc. have been used to train and test the system. It was observed that the Memory Network model achieved the best average accuracy (80%) among the three models implemented, in retrieving exact answers in Malayalam. This work is unique because all the reported work on Malayalam QA is rule-based, capable of extracting answers to factoid questions only. The proposed system which uses deep learning approaches is scalable and thus capable of enhancing the ongoing research in Malayalam QA along with the development of the Malayalam QA corpus.
Read full abstract