Abstract
The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelling whereas their pronunciations and meanings are totally different. To remove this ambiguity, special marks are put above or below the spelling characters to determine the correct pronunciation. These marks are called diacritics and the language that uses them is called a diacritized language. This paper presents a system for Arabic language diacritization using Hid- den Markov Models (HMMs). The system employs the renowned HMM Tool Kit (HTK). Each single diacritic is represented as a separate model. The concatenation of output models is coupled with the input character sequence to form the fully diacritized text. The performance of the proposed system is assessed using a data corpus that includes more than 24000 sentences.
Highlights
The pronunciation of a word in some languages, like English, is almost always fully determined by its constituting characters
The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing
This paper presents a system for Arabic language diacritization using Hidden Markov Models (HMMs)
Summary
The pronunciation of a word in some languages, like English, is almost always fully determined by its constituting characters. In these languages, the sequence of consonants and vowels determines the correct corresponding voice while pronouncing a word. There are languages, like Arabic, where the pronunciation of their words cannot be fully determined by their spelling characters only. Arabic characters may have diacritics which are written as strokes and can change the pronunciation and the meaning of the word. The formal approach to the problem of restoration of the diacrical marks of Arabic text involves a complex integration of the Arabic morphological, syntactic, and semantic rules [4]. Semantics help to resolve ambiguous cases and to filter out hypothesis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.