Abstract

For detecting anomalies from sounds generated by electronic devices, self-supervised learning of deep neural networks (DNNs) has been popularly employed. In self-supervised learning, a DNN model is trained over normal data to solve some pretext tasks, and test data giving reduced task performance are regarded as anomalies. Popular choices for the pretext task are the reconstruction and the classification tasks, where a model is trained to predict masked parts of the spectrogram and to classify the internal classes of normal data, respectively. However, the reconstruction task is hard to distinguish anomalies from noises in noisy conditions, and the classification task often fails to learn meaningful features when the diversity across internal classes is too small or too evident. We propose a combination of prediction and segmentation tasks to overcome these limitations. For the proposed tasks, two different machine sounds are mixed with a constant ratio, and a model is trained to predict both the mixed spectrogram of future time and mixing ratio based on the present and past sound mixture. We train a WaveNet-based model using dual tasks simultaneously, which shows remarkable performance improvements over the conventional models and achieves state-of-the-art performance on the DCASE 2020 Task 2 dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call