Abstract

In the field of automatic speech recognition (ASR), speaker and task adaptation has become a very important research topic in recent years. Although recognition performance obtained by a speaker independent (SI) ASR system can be sufficiently good for many tasks, there still exists a large performance gap between an SI system and a speaker dependent (SD) ASR system. Although a, number of successful adaptation techniques have been developed, which modify speaker independent model parameters to favor the speaker and task with given adaptation data, little attention has been given to designing the adaption script effectively and efficiently to make the adapted system benefit most from the data. I have therefore chosen this aspect as the research topic in this thesis. In recent years, the idea of active learning has been applied to several ASR applications. Given the large amount of unlabelled training data, manual annotation efforts can be reduced by intelligently selecting only a subset of training data for manual labelling most useful for learning purposes of building up the application at hand. In this thesis, the concept of active learning is extended to the scenario of supervised speaker and task adaptation, where the system takes the initiative of eliciting small (ideally minimum) amount of adaptation data from the user for achieving high (ideally maximum) performance improvement by adapting the HMMs using the elicited adaptation data. Based on the concept of active learning, the task vocabulary confusability, which is highly related to the difficulty of the given task, is analyzed by using a new DTW-based HMM dissimilarity measure. The adaptation script is then generated effectively according to the vocabulary confusability based information. In this thesis, the adaptation script generation problem is cast as two constrained optimization problems with the same constraints but different objective functions. The first problem is maximum coverage problem with a Knapsack constraint problem. The second is nonlinear binary optimization problem with linear constraints. Two new approaches, namely, rank predicted pseudo-greedy approach and variable-depth approach with gradient projection guidance, are proposed to resolve these two optimization problems respectively. The active approach with the efficient adaptation script generation by using these two new approaches can generate the adaptation script much faster than the traditional approach without sacrificing recognition performance. Comparative experiments are designed and conducted for a simple application scenario involving searching an item from a long list via voice. The experimental results demonstrate that the proposed active adaptation strategy performs much better than traditional passive adaptation strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call