Abstract

A keyword spotter can be considered as a binary classifier which classifies a set of uttered sentences into two groups on the basis of whether they contain target keywords or not. For this classification task, the keyword spotter needs to identify the target keywords locations based on a fast and accurate search algorithm. In our previous works, we exploited a modified Viterbi (M-Viterbi) search algorithm which has two known drawbacks. First, to locate the target keywords, it runs an exhaustive search through all possible segments of input speech. Second, while computing the start and end time-frames of each new phone, it makes the keyword spotter to trace-back and re-evaluate the timing alignments of all previous one(s), despite the fact that very limited amount of data -if any- would get updated as a result. These two pitfalls cause a dramatically enlarged search space as well as a significant increase in computational complexity. In this paper, we propose a Hierarchical Search (H-Search) algorithm which allows the system to ignore some segments of input speech at each level of hierarchy, according to their lower likelihood of containing the target keywords. In addition, unlike the M-Viterbi algorithm, the H-Search algorithm does not demand repeated evaluations when computing the current phone alignment which, in turn, results in a narrowed-down search space (O(TP) versus O(TPLmax) – where T is number of frames, P is number of keyword phones and Lmax is the maximum phone duration) as well as a decreased computational complexity (O(TPLmax) versus O(TPLmax3)) compared to those of the M-Viterbi algorithm. We applied the H-Search algorithm to the classification part of an Evolutionary Discriminative Keyword Spotting (EDKWS) system introduced in our previous works. The experimental results indicate that the H-Search algorithm is executed 100 times faster than the M-Viterbi algorithm while the performance of the EDKWS system degrades no more than two percent compared to that of the M-Viterbi algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call