Abstract

This paper studies the application of deep convolutional neural networks for the processing of audio files, particularly for classifying amplitude-frequency characteristics of audio signals. The mapping of audio fragments to each other is reduced to verifying objects by their representation. A large representative sample of audio signals was collected and supplemented with a satisfying Free Music Archive dataset to produce a dataset for training a deep convolutional neural network. The CQT-Net architecture is taken as a predictive model with cosine similarity being used to compare feature vectors. Four types of augmentation, including Gaussian noise, reverberation, change in pitch frequency, and change in tempo of the audio signal, are used to prevent retraining of the predictive model. The verification quality of the predictive model is tested on two separate datasets consisting of 1500 audio recordings excluded from the training dataset. Detection error tradeoff curves are plotted for all datasets, including testing ones with a changed pace and with a changed "pitch." Equal Error Rate is used as a model quality metric. The probability of identification of commonly used distortions of audio signals in the amplitude-frequency domain is evaluated to be higher than 92%. It signifies the reliability of the developed model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.