Abstract

The environmental sounds are usually classified by a convolutional neural network. However, there are few studies investigates the network input construction issue. In this paper, we investigate the impact of time resolution index of input spectrogram on classification performance. We want to verify whether the impact exists and quantify the impact. To this end, we adopt an efficient convolutional network architecture for urban sound classification and conducted a series of experiments with different network inputs. Experiment results confirm our expectation, and suggest that the best classification performance on urban sounds is usually obtained when the input spectrograms have moderate time resolution, especially for those sounds with relatively short temporal structures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call