Abstract

Language-independent spoken term detection (LI-STD) refers to the process of locating the occurrences of spoken queries from speech databases of any language. This paper alization of a multilingual broad phoneme classifier (BPC) and its application for the development of an LI-STD system. This work proposes a multi-stage architecture to address the task of LI-STD for low-resourced languages, where there is limited amount of labelled training data. The proposed LI-STD system contains three stages; one label sequence matching stage and two template matching stages. A deep neural network (DNN) based BPC trained using 16 handcrafted, signal-based features is the backbone of the proposed LI-STD system. In LI-STD system, stage 1 performs a broad phoneme sequence matching, while stage 2 and 3 perform template matching on posteriorgram and feature sequence, respectively. Concatenation of multiple stages results in search space reduction for the later computationally intensive template matching stages. In order to adapt to a new/unseen language, the BPC gets retrained using selected broad phoneme labelled data of the language generated by itself. The effectiveness of the proposed system is demonstrated on a set of low-resourced Indian languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call