Audio fine classification using the statistical analysis of acoustic images

Edward Chilton,Ioannis Paraskevas

doi:10.1121/1.4780529

Abstract

The fine classification of audio utterances is an important problem because the features that have to be extracted need to be very accurate in order to contribute to effective classification. In this paper, results are presented for a fine classification problem: namely the classification of two groups of different kinds of gunshots. The problem of accurate classification can be divided into two parts: (i) feature extraction and (ii) classification. The more effective the feature extraction, the more effectively the classifier will be able to categorize the various audio samples. In this paper, a novel method for the automatic recognition of acoustic utterances is presented using acoustic images as the basis for the feature extraction. The feature extraction process is based on the time-frequency distribution of an acoustic unit. A novel feature extraction technique based on the statistical analysis of the spectrogram Hartley transform (distribution) and Choi–Williams distributions of the data is reported as well as a brief discussion of the classifier used. The image is compressed using a statistical analysis of the acoustic image formed from the time-frequency distributions of acoustic data. The kurtosis, L-moments and entropy of the distributions, as well as the energy, contrast, etc., of the corresponding co-occurrence matrices of the distributions are calculated and then combined into a feature matrix. These appropriate features are then presented to the classifier. Initial results obtained indicate that the method is capable of accurate discrimination for fine classification.

Full Text