N-Gram based Assamese Question Pattern Extraction and Probabilistic Modelling

Rita Chakraborty Rita Chakraborty

doi:10.52783/jes.2996

Abstract

N-gram probabilities provide valuable information in understanding, processing, and modelling various natural language processing tasks. They assign probabilities to the sequences of words and subsequently to the whole sentence. Such information is very essential to make more accurate predictions in machine learning based systems. Here in this paper we worked on finding Parts-of-Speech (PoS) sequence based Assamese question patterns. We derived the unique bi-grams and tri-grams of PoSs occurring in the patterns and also extracted the probabilities of them. We then tried to find the unique PoS patterns of Assamese questions. We also have tried to incorporate the probabilities of unique bi-grams and tri-grams and the combined bi-grams and tri-grams probabilities of all patterns. Our work is a novel approach of finding the probabilities of bi-grams and tri-grams of the patterns occurring in Assamese questions.

Full Text