Abstract

Automatic Question & Answer generation is a challenging task in natural language processing. The proposed system is capable of automatically generating questions and answers from a given history related text content in Tamil. It processes the input text using various NLP techniques. The system has four modules namely, Preprocessing module, Rulebased module, Named Entity Recognition (NER) module, Question Answer Generator (QAG) module. Regex patterns and gazetteers are used in rule-based module and machine learning approach is used for NER module. A NER module built using Conditional Random Field (CRF) classifier is used which is trained on a manually tagged dataset for history domain in Tamil. Questions are formed using grammatical and defined rules from the named entities identified from both rule-based and NER module. An affix stripping algorithm implemented to find the inflection suffix. A history text from Wikipedia is evaluated by 16 native Tamil speakers under categories like undergraduates, graduates and experts. According to the evaluation results, 62.22% of total generated questions are grammatically correct and meaningful questions despite the domain and language related challenges.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call