In automatic speech recognition (ASR) systems, hidden Markov models (HMMs) have been widely used for modeling the temporal speech signal. As discussed in Part I, the conventional acoustic models used for ASR have many drawbacks like weak duration modeling and poor discrimination. This paper (Part II) presents a review on the techniques which have been proposed in literature for the refinements of standard HMM methods to cope with their limitations. Current advancements related to this topic are also outlined. The approaches emphasized in this part of review are connectionist approach, explicit duration modeling, discriminative training and margin based estimation methods. Further, various challenges and performance issues such as environmental variability, tied mixture modeling, and handling of distant speech signals are analyzed along with the directions for future research.
Read full abstract