Abstract

Abstract The current lack of music core literacy in vocal music teaching needs to be solved, and this paper aims to improve this problem. In the paper, a hybrid attention module is added to the multi-channel of MFCC to extract the acoustic elements of the musical instruments, and after characterizing the musical instruments as a multi-view sequence, visual features and temporal features are fused, and a graph convolution module is added to extract the visual elements of the musical instruments, and the two are fused. After extracting the elements of musical instruments, the development of vocal fusion teaching of the two was analyzed by combining the gray correlation analysis method, and experiments were designed on the basis of which to investigate the extraction effect of the elements of musical instruments as well as the development of vocal fusion pedagogy. The results show that the average accuracy of instrumental elements of the multimodal extraction model is 0.919, which is improved by 0.17 compared with the Baseline classification accuracy, and the extraction accuracy can reach more than 90% for flute, percussion instrument, plucked instrument, wind instrument, suona, xiao, sheng, wind instrument, pipa, and erhu. The gray correlation of the fourth grade ranged from 0.75 to 0.88, with a high improvement in the same grade, and the gray correlation of different grades all ranged from 0.3 to 1.0, and the higher the grade, the better the development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call