Audio-based spam call detection

Benjamin M Elizalde,Dimitra Emmanouilidou

doi:10.1121/10.0008583

Abstract

Spam communications are organized attempts of falsified claims with the purpose of marketing, spreading false information and deceiving the end recipient. Phone spam is an international nuisance, with the U. S. among the most spammed countries in the world in 2020. In addition to the agitating nature of these calls, criminal scams are defrauding subscribers of billions of dollars every year. Therefore, it is necessary to develop automated systems for the identification of spam calls to minimize fraud and reduce the displeasure of receiving them. The call origin, call duration and other Call Detail Records can be used to assess whether a call is fraudulent or not, but the actual audio content is overlooked. This work focuses on extracting acoustic features from voicemail recordings containing speech, which are used to train Machine Learning models that identify spam calls. Both local and global feature descriptors are used, including Mel-Frequency Cepstral Coefficients and Log-Mel Spectrum, and their efficacy for distinguishing spam from non-spam calls is explored. We demonstrate that a spam voice call can be detected while relying only on the acoustic information of the call. A further analysis of the temporal and spectral features that are most informative for the task is also presented.

Full Text