Abstract

Communication languages convey information through the use of a set of symbols or units. Typically, this unit is word. When developing language technologies, as words in a language do not have the same prior probability, there may not be sufficient training data for each word to model. Furthermore, the training data may not cover all possible words in the language. Due to these data sparsity and word unit coverage issues, language technologies employ modeling of subword units or subunits, which are based on prior linguistic knowledge. For instance, development of speech technologies such as automatic speech recognition system presume that there exists a phonetic dictionary or at least a writing system for the target language. Such knowledge is not available for all languages in the world. In that direction, this article develops a hidden Markov model-based abstract methodology to extract subword units given only pairwise comparison between utterances (or realizations of words in the mode of communication), i.e., whether two utterances correspond to the same word or not. We validate the proposed methodology through investigations on spoken language and sign language. In the case of spoken language, we demonstrate that the proposed methodology can lead up to discovery of phone set and development of phonetic dictionary. In the case of sign language, we demonstrate how hand movement information can be effectively modeled for sign language processing and synthesized back to gain insight about the derived subunits.

Highlights

  • Communication is a vital part of human life

  • This paper addresses a computational linguistic paradigm, where, given a set of speech utterances or sign productions and the pairwise comparison between each pair of speech utterances or sign productions in the set on whether they correspond to the same word or sign or not, the goal is to derive subword units or subunits, and link them to available prior linguistic knowledge

  • In the case of spoken language, we demonstrated that, for both the states of word level hidden Markov models (HMM) and derived subword units, a probabilistic relationship to phones can be learned by exploiting auxiliary resources to identify the phone set and obtain a phone-based pronunciation dictionary

Read more

Summary

Introduction

Communication is a vital part of human life. Speech is the most common mode of communication in the hearing community, while the preferred mode of communication in the deaf community is sign language to communicate. (Briefly, sign language is a visual mode of communication, where the information is conveyed through multiple visual channels such as hand gestures (hand shape, location, position and movement), facial expressions, body postures, and lip movements [1]. (Briefly, sign language is a visual mode of communication, where the information is conveyed through multiple visual channels such as hand gestures (hand shape, location, position and movement), facial expressions, body postures, and lip movements [1]. One potential reason for that is that it has been well understood through linguistic studies that the time structure of spoken word units can be represented and modeled as a sequence of subword units such as phonemes/phones and syllables [3]. Such linguistically motivated subword units help in handing data scarcity issues when training statistical models and handle words that are unseen during training

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call