Abstract

Face masks alter the speakers' voice, as their intrinsic properties provide them with acoustic absorption capabilities. Hence, face masks act as filters to the human voice. This work focuses on the automatic detection of face masks from speech signals, emphasising on a previous work claiming that face masks attenuate frequencies above 1 kHz. We compare a paralinguistics-based and a spectrograms-based approach for the task at hand. While the former extracts paralinguistic features from filtered versions of the original speech samples, the latter exploits the spectrogram representations of the speech samples containing specific ranges of frequencies. The machine learning techniques investigated for the paralinguistics-based approach include Support Vector Machines (SVM), and a Multi-Layer Perceptron (MLP). For the spectrograms-based approach, we use a Convolutional Neural Network (CNN). Our experiments are conducted on the Mask Augsburg Speech Corpus (MASC), released for the Interspeech 2020 Computational Paralinguistics Challenge (COMPARE). The best performances on the test set from the paralinguistic analysis are obtained using the high-pass filtered versions of the original speech samples. Nonetheless, the highest Unweighted Average Recall (UAR) on the test set is obtained when exploiting the spectrograms with frequency content below 1 kHz.

Highlights

  • Respiratory viruses, such as the Coronavirus Disease 2019 (COVID-19), are transmitted via direct contact with an infected person or a contaminated surface, and through respiratory droplets containing the virus, which can be suspended in the air for a long time and over a long distance [1]

  • As the filtering effect can be observed in the frequency domain, we investigate the performance of models based on a Convolutional Neural Network (CNN) to exploit the spectrograms of the original audio signals

  • The performance of our models is comparable with the model performances reported in the baseline paper of the COMPARE 2020 Challenge [11], their best model on the test set scored a Unweighted Average Recall (UAR) of 71.8 % using a fusion of best approach

Read more

Summary

Introduction

Respiratory viruses, such as the Coronavirus Disease 2019 (COVID-19), are transmitted via direct contact with an infected person or a contaminated surface, and through respiratory droplets containing the virus, which can be suspended in the air for a long time and over a long distance [1]. According to the World Health Organisation (WHO), current evidences support the hypothesis that COVID19 mainly spreads by respiratory droplets among people in close contact when coughing, sneezing, speaking, singing or breathing heavily. Governments worldwide have ruled on the obligatoriness of wearing face masks in public transports, public spaces, frequented streets, shops, and even in workplaces to control the spread of COVID-19. The need to check the compliance with this precaution measure has motivated the use of new digital solutions for the automatic detection of face masks from both visual and acoustic signals.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call