Abstract
Audio classification serves as the fundamental step towards the rapid growth in audio data volume. Due to the increasing size of the multimedia sources speec h and music classification is one of the most impor tant issues for multimedia information retrieval. In thi s work a speech/music discrimination system is deve loped which utilizes the Discrete Wavelet Transform (DWT) as the acoustic feature. Multi resolution analysis is the most significant statistical way to extract the features from the input signal and in this study, a method is deployed to model the extracted wavelet feature. Su pport Vector Machines (SVM) are based on the principle of structural risk minimization. SVM is a pplied to classify audio into their classes namely speech and music, by learning from training data. Then the proposed method extends the application of Gaussia n Mixture Models (GMM) to estimate the probability density function using maximum likelihood decision methods. The system shows significant results with an accuracy of 94.5%.
Highlights
The term audio is used to indicate all kinds of audio signals, such as speech, music as well as more general sound signals and their combinations
An audio feature extraction and a multi-group classification scheme that focuses on identifying discriminatory timefrequency subspaces using the Local Discriminate Bases (LDB) technique has been described in (Mishra and Agrawal, 2012)
Feki et al (2012) a speech/music discrimination system was proposed based on Mel-Frequency Cepstral Coefficient (MFCC) and Gaussian Mixture Models (GMM) classifier
Summary
The term audio is used to indicate all kinds of audio signals, such as speech, music as well as more general sound signals and their combinations. Compared to an ordinary speech signal, music has lower variability in zero-crossing rate [base]. Dhanalakshmi / Journal of Computer Science 10 (1): 34-44, 2014 speech is typically interspersed with segments of music and other background noise (Ghosal and Saha, 2011). These Speech/music mixtures appear quite often in radio and television programmes. Infotainment productions and commercials contain speech, music, sound effects and background sounds In commercials these signal classes appear often in a mixed and fast changing manner (Kim et al, 2012). SVM is used to classify the audio signal into speech and music
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.