1-D convolutional neural network-based crosstalk quality assessment

Shuxin Yang

doi:10.54254/2755-2721/18/20230987

Abstract

Due to the current state of the declining field of crosstalk, many studies are engaged in investigating causes and devising strategies for its improvement. Nonetheless, there are often no quantitative measures available for assessing the quality of crosstalk. This study employed recordings made during traditional crosstalk performances to assess crosstalk quality based on the percentage of positive audience feedback sounds such as applause and laughter throughout the performance. This study compares Mel Frequency Cepstral Coefficients (MFCC) and Mel Filterbank Energies (MFE) audio feature extraction methods and compares different classification models by training a one-dimensional convolutional neural network model to explore the model that performs better in audio classification under low audio quality conditions, such as low sampling rate and small signal-to-noise ratio. In this study, the training data set has been divided into two different schemes. One is the sound of laughter, applause, singing and speaking, the other is the sound of speaking, singing and laughter&applause. In this study, the final performance of the different models, including accuracy and loss, is counted. The experimental results demonstrated that the models obtained when the MFE method is used and the audio classification is labeled as singing, applause and laughter, and performance speech training have better performance when the comic audio is classified.

Full Text