Audio Mining: Unsupervised Spoken Term Detection over an Audio Database

Kishore Kumar R,K Sreenivasa Rao,Sandipan Sarkar,Pradeep Rengaswamy

doi:10.1109/icacci.2018.8554731

Abstract

Audio mining, also called audio searching, takes a spoken query and locates the search term in an audio/speech file. Spoken query search or spoken term detection (STD) provides an efficient means for content-based retrieval of speech files. To perform STD, the posterior features of the spoken query as well as the audio files are well utilized. The posterior features are computed by training the Gaussian mixture model (GMM) with mel-frequency cepstral coefficients (MFCC) of audio files. A distance matrix is computed between each pair of query with audio files posterior features to examine the matched portions. To identify and extract the matched portions from the distance matrix an novel pattern identification method is proposed using image processing techniques and dynamic time warping (DTW). By processing the spoken query with our proposed method, most of the matched audio files containing the spoken query are retrieved. The performance of the proposed method is evaluated using the standard metric like average precision (AP). For experimentation of the proposed method Hindi news speech data was used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio Mining: Unsupervised Spoken Term Detection over an Audio Database

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Representation Learning for Spoken Term Detection
P Raghavendra Reddy ... B Yegnanarayana
-
P Raghavendra Reddy, et. al.P Raghavendra Reddy ... B Yegnanarayana
07 Dec 2016
07 Dec 2016

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
Peng Yang ... Haizhou Li
-
Peng Yang, et. al.Peng Yang ... Haizhou Li
14 Sep 2014
14 Sep 2014

Speed improvements to Information Retrieval-based dynamic time warping using hierarchical K-Means clustering
Gautam Mantena ... Xavier Anguera
-
Gautam Mantena, et. al.Gautam Mantena ... Xavier Anguera
01 May 2013
01 May 2013

Unsupervised spoken term detection with acoustic segment model
Haipeng Wang ... Cheung-Chi Leung
-
Haipeng Wang, et. al.Haipeng Wang ... Cheung-Chi Leung
01 Oct 2011
01 Oct 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio Mining: Unsupervised Spoken Term Detection over an Audio Database

Abstract

Talk to us

Similar Papers