With the continuous development of communication technology, computer technology, and network technology, a large amount of information such as images, videos, and audios has grown exponentially, and people have started to be exposed to massive multimedia contents, which can easily and quickly access the increasingly rich music resources, so new technologies are urgently needed for their effective management, and automatic classification of audio signals has become the focus of engineering and academic attention. Currently, music retrieval can be achieved by selecting song titles and singer names, but as people’s living standards continue to improve, the spiritual realm is also enriched. People want to be able to select music with different types of emotional expressions with their emotions. It mainly includes the basic principles of audio classification, the analysis and extraction of music emotion features, and the selection of the best classifier. Two classification algorithms, hybrid Gaussian model and AdaBoost, are used to classify music emotions, and the two classifiers are combined. In this paper, we propose the Discrete Harmonic Transform (DHT), a sparse transform based on harmonic frequencies. This paper derives and proves the formula of Discrete Harmonic Transform and further analyzes the harmonic structure of musical tone signal and the accuracy of harmonic structure. Since the timbre of musical instruments depends on the harmonic structure, and similar instruments have similar harmonic structures, the discrete harmonic transform coefficients can be defined as objective indicators corresponding to the timbre of musical instruments, and thus the concept of timbre expression spectrum is proposed, and a specific construction algorithm is given in this paper. In the application of musical instrument recognition, the 53-dimensional combined features of LPCC, MFCC, and timbre expression spectrum are selected, and a nonlinear support vector machine is used as the classifier. The classification recognition rate is improved by reducing the number of feature dimensions.
Read full abstract