Abstract

N-gram probabilities provide valuable information in understanding, processing, and modelling various natural language processing tasks. They assign probabilities to the sequences of words and subsequently to the whole sentence. Such information is very essential to make more accurate predictions in machine learning based systems. Here in this paper we worked on finding Parts-of-Speech (PoS) sequence based Assamese question patterns. We derived the unique bi-grams and tri-grams of PoSs occurring in the patterns and also extracted the probabilities of them. We then tried to find the unique PoS patterns of Assamese questions. We also have tried to incorporate the probabilities of unique bi-grams and tri-grams and the combined bi-grams and tri-grams probabilities of all patterns. Our work is a novel approach of finding the probabilities of bi-grams and tri-grams of the patterns occurring in Assamese questions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call