Abstract
Nowadays the applications in multimedia domain require that the Speech/Music classifier has many other merits in addition to the accuracy, such as short-time delay and low complexity. Here, we endeavor to form a Speech/Music classifier by using different data mining methods. The main contributions of this paper are to obtain a system by analyzing the inherent validity of diverse features extracted from the audio, building a hierarchical structure of oblique decision trees (HODT) to maintain optimal performances, and applying a novel context-based state transform (ST) strategy to refine the classification results. The proposed algorithm is evaluated by a set of 5-11min 702 audio files, which are made from 54 speech or music files according to different Signal-to-Noise Ratio (SNR) levels and diverse noise types. The experiment results show that our proposed classifier outperforms AMR-WB+ by achieving 97.9% and 95.9% in classification rate at the 10ms frame level in pure and high SNR (>=20dB) environment, respectively. The post-processing ST strategy further enhances the system performance, particularly at low SNR circumstances (10dB), with 5.6% up in the accuracy rate. In addition, the complexity of the proposed system is lower than 1WMOPS which make it easily adaptable to many scenarios.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have