This paper studies the automatic recognition system of vocal performance style based on residual network (ResNet). The system can precisely recognize multiple vocal performance styles through automation technology. Firstly, an audio dataset covering multiple vocal styles was constructed based on the publicly available dataset GTZAN, and the data in the dataset was preprocessed to extract key music features. Subsequently, the ResNet model was chosen as the core recognition component of the system, which was trained to learn the complex relationship between audio features and vocal styles. Finally, the construction of the system was carried out. In the verification phase, the dataset was divided into two parts: the training set and the test set. After training the model using the training set, its performance was comprehensively evaluated on the test set. The experiment showed that the recognition system based on ResNet performed well in precision, recall, and accuracy, with specific values of 95.6%, 91.6%, and 96.7%, respectively. Meanwhile, the recognition accuracy of the ResNet-based system for 10 vocal performance styles was 95.96% on average. Applying ResNet to an automatic recognition system for vocal performance styles not only provides an efficient and reliable solution in this field, but also opens up new horizons for the further application of deep learning in the music industry.
Read full abstract