Feature Extraction Based on the Non-Negative Matrix Factorization of Convolutional Neural Networks for Monitoring Domestic Activity With Acoustic Signals

Seokjin Lee,Hee-Suk Pang

doi:10.1109/access.2020.3007199

Abstract

In this paper, a feature extraction method is proposed based on the non-negative matrix factorization (NMF) for classifiers for monitoring domestic activities with acoustic signals. Most of the classifiers of the acoustic signals use data-independent spectral features (e.g., log-Mel spectrum and Mel-frequency cepstral coefficients). Recently, some novel feature extraction methods have been researched, including convolution-NMF-based features combined with K-means clustering. This study proposes an enhanced NMF-based feature extraction method that is inspired by the NMF-based noise reduction algorithm. The proposed method independently estimates the frequency basis matrix for each class, and then cascades the basis matrices to form the entire frequency bases, where the acoustic signal is transformed to the proposed feature by estimating the temporal basis matrix with the trained frequency bases. In addition, this study proposes a data augmentation method for the proposed feature that is inspired by the “mix and shuffle” method for audio waveforms. In order to evaluate the proposed system, which consists of the proposed NMF-based feature and the convolutional-neural-network-based classifier, some evaluations were performed using the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 5 - Monitoring of Domestic Activities Based on Multi-channel Acoustics - Database. The results showed that the proposed system has comparable performance to that of state-of-the-art algorithms and that it has enhanced the F1-score performance of 6%-12% in comparison with the conventional NMF-based feature extraction method that is based on convolutional NMF and K-means clustering.

Highlights

Acoustic scene classification (ASC) is tasked to automatically recognize environments through acoustic signals
In order to compare the performace of the proposed feature with the various existing features, the performance of the proposed system was compared to the systems utilizing the conventional features, including constant-Q transforms (CQT) [51], [52], power-normalized cepstral coefficient (PNCC) [53], Mel-frequency discrete wavelet coefficients (MFDWC) [14], gammatonegram (GAM) [17], and gammatone frequency cepstral coefficient (GFCC) [18]
In this paper, an negative matrix factorization (NMF)-based feature extraction method is proposed for the monitoring domestic activity tasks by using sound signals

Summary

INTRODUCTION

Acoustic scene classification (ASC) is tasked to automatically recognize environments through acoustic signals. The NMF method has been tried in previous studies for acoustic scene classification and sound event detection tasks These previous investigations utilize the NMF method as an auxiliary tool to pre-process the input signal or the activity classifier, rather than a feature extraction method. If the NMF algorithm is applied to a magnitude spectrogram of a music signal that consists of three musical events, each column vector of the matrix W may correspond to a frequency structure, and the row vector of the matrix H may correspond to a temporal envelope of a musical event, as shown in Fig. 3 (a) By focusing on these characteristics of the NMF method in the acoustic signals, several NMF applications have been developed, e.g., the speech denoising [28], [29] and the active sonar reverberation suppression [44], as shown in Fig. 3 (b). While the NMF method consists of numerous multiplications, the mix and shuffle uses no multiplication, and the proposed data augmentation method can expand a large amount data with very light operations

NETWORK STRUCTURE OF THE CLASSIFIER

COMPARISONS WITH CONVENTIONAL FEATURES

Findings

CONCLUSION