Abstract

Audio mining, also called audio searching, takes a spoken query and locates the search term in an audio/speech file. Spoken query search or spoken term detection (STD) provides an efficient means for content-based retrieval of speech files. To perform STD, the posterior features of the spoken query as well as the audio files are well utilized. The posterior features are computed by training the Gaussian mixture model (GMM) with mel-frequency cepstral coefficients (MFCC) of audio files. A distance matrix is computed between each pair of query with audio files posterior features to examine the matched portions. To identify and extract the matched portions from the distance matrix an novel pattern identification method is proposed using image processing techniques and dynamic time warping (DTW). By processing the spoken query with our proposed method, most of the matched audio files containing the spoken query are retrieved. The performance of the proposed method is evaluated using the standard metric like average precision (AP). For experimentation of the proposed method Hindi news speech data was used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.