Abstract

The key purpose of this paper is to train a voice control system if a small amount of user speech data is available without need for general acoustic model if the latter does not fit to the user voice due to known variability sources (childhood, voice diseases, non-nativeness, etc.). We explore the possibility to increase the recognition rate by requiring the speaker to put the stress on all vowels in a command. We propose the novel modification of our fuzzy phonetic decoding method, in which each vowel is put in correspondence with a fuzzy union of sets of available reference signals from this class. A first, syllables are detected and phoneme segmentation is performed. Secondly, the command is extracted from spontaneous speech by thresholding the ratio of the duration of homogeneous segments to the duration of the whole syllable. Finally, each syllable is put in correspondence with the fuzzy set of vowels, and commands are ordered based on similarity with the fuzzy set of the utterance. The experimental results in synthetic and real Russian datasets prove that our method is characterized by better accuracy in comparison with known recognition methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call