A Hidden Markov Model for Morphology of Compound Roles in Persian Text Part of Tagging

H Rezaei ,Homayun Motameni ,Behnam Barzegar

doi:10.5829/ije.2021.34.11b.12

Abstract

Nowadays, data mining has become significant given the popularity of social networks as well as the emergence of abbreviated words, foreign terms and emoticons in the Persian language. Meanwhile, numerous studies have been conducted to identify the type of words. On the one hand, identifying the role of each word in a sentence is far more important than identifying the type of word in the sentence. On the other hand, the spelling-grammatical similarity of Persian to Arabic has enabled the newly proposed method in this paper to be applied to Arabic. In this paper, we adopted the Hidden Markov Model (MHM) and Tri-gram tagging with the aim of identifying the morphology of composition roles in Persian sentences. Then, a comparison was made between the technique developed in this paper and the Hidden Markov Model, Uni-gram and Bi-gram tagging. The proposed method supports the results obtained by the word role identification through "independent" and "dependent" roles and several factors that have a contribution to the words roles in sentences. In fact, the simulation results show that the average success rates of independent composition roles with MHM and Tri-gram tagging were 20.56% and 17.67% compared to Uni-gram and Bi-gram methods, respectively. Regarding the dependent composition role, there were improvements by 24.67% and 32.62%, respectively.

Full Text