Abstract

Audio scene recognition is a task that enables devices to understand their environment through digital audio analysis. It belongs to a branch of the field of computer auditory scene. At present, this technology has been widely used in intelligent wearable devices, robot sensing services, and other application scenarios. In order to explore the applicability of machine learning technology in the field of digital audio scene recognition, an audio scene recognition method based on optimized audio processing and convolutional neural network is proposed. Firstly, different from the traditional audio feature extraction method using mel-frequency cepstrum coefficient, the proposed method uses binaural representation and harmonic percussive source separation method to optimize the original audio and extract the corresponding features, so that the system can make use of the spatial features of the scene and then improve the recognition accuracy. Then, an audio scene recognition system with two-layer convolution module is designed and implemented. In terms of network structure, we try to learn from the VGGNet structure in the field of image recognition to increase the network depth and improve the system flexibility. Experimental data analysis shows that compared with traditional machine learning methods, the proposed method can greatly improve the recognition accuracy of each scene and achieve better generalization effect on different data.

Highlights

  • As an information carrier, sound is an important way for us to perceive the external environment

  • With the development of signal processing technology and computer science, the audio processing task of extracting information from sound assisted by machine has attracted more and more researchers’ attention [1,2,3,4,5,6]

  • Its main goal is to enable devices to understand and distinguish their environment by analyzing sound. e implementation principle is that the equipment extracts different audio features through audio scene recognition technology to obtain the corresponding features and models the audio scene according to these features, that is, constructs a classifier

Read more

Summary

Introduction

Sound is an important way for us to perceive the external environment. With the development of signal processing technology and computer science, the audio processing task of extracting information from sound assisted by machine has attracted more and more researchers’ attention [1,2,3,4,5,6]. E implementation principle is that the equipment extracts different audio features through audio scene recognition technology to obtain the corresponding features and models the audio scene according to these features, that is, constructs a classifier. Audio scene recognition can improve the performance of sound event detection by providing a priori information about the probability of some events. The implementation of audio scene recognition often applied general classifiers With a large amount of audio data, it is possible to realize the deep learning method which is difficult to realize in the past

Literature Review
Audio Processing Optimization Method
Audio Scene Recognition Method Based on Convolutional Neural Network
Experimental Results and Analysis
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call