Abstract

Azeri (Azerbaijani) language is one of the more than 50 Turkic languages which it is a little studied language in terms of using the modern signal processing algorithms. This paper tackles the problem of Hidden Markov Models (HMMs) based next word prediction for this language based on Natural Language Processing (NLP) principles using Python high-level programming language. The software is included a small Azeri vocabulary database, the various Python libraries, a HMM model and a Web based interface. In this research, the database was constructed by a predictor parser which it was implemented for the first time for Azeri language. The database was concluded by the most general Azeri language words to introduce HMMs based generated word pairs. The Model was trained by 90% of the database, hence, predicting the next 5 words on the test data resulted 54% accuracy.

Highlights

  • Azeri (Azerbaijani) language is one of the more than 50 Turkic languages [1] which it is a little studied language in terms of using modern signal processing algorithms and creation of modern language technology applications [2]. despite a huge number of researches on the other languages since the 80th years of the last century, Azeri language is a little investigated language, where all those researches studied applying Automatic Speech Recognition (ASR), Text-To-Speech (TTS) or Authorship Recognition (AR) algorithms on this language as “Dilmanc” project [2,3,4,5,6].For the first time, the word prediction for Azeri language has been mentioned in this research

  • This paper tackles the problem of Hidden Markov Models (HMMs) based word prediction for this language based on Natural Language Processing (NLP) principles using Python high-level programming language

  • The database was constructed by a predictor parser which it was implemented for the first time for Azeri language

Read more

Summary

INTRODUCTION

Azeri (Azerbaijani) language is one of the more than 50 Turkic languages [1] which it is a little studied language in terms of using modern signal processing algorithms and creation of modern language technology applications [2]. despite a huge number of researches on the other languages since the 80th years of the last century, Azeri language is a little investigated language, where all those researches studied applying Automatic Speech Recognition (ASR), Text-To-Speech (TTS) or Authorship Recognition (AR) algorithms on this language as “Dilmanc” project [2,3,4,5,6]. The word prediction for Azeri language has been mentioned in this research. Reducing the time consumption for typing in the electronically communications by means of the word prediction, would be very helpful for day to day usage. During the last decade, one of the highly discussed topics in Natural Language Processing research domain was the word prediction for typing in the electronically communications [7]. It will be shortly reviewed HMMs and will be discussed the training on the model. It will be explained the collection of the database issue.

HMMS AND THE TRAINING ON THE MODEL
COLLECTION OF THE DATABASE
THE SOFTWARE
SOME EXPERIMENTAL RESULTS
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call