Development of a regional voice dataset and speaker classification based on machine learning

Muhammad Ismail,Shahid Munir Shah,Dostdar Hussain,Lachhman Das Dhomeja,Imran Ali,Shahzad Memon,Sabit Rahim

doi:10.1186/s40537-021-00435-9

Muhammad Ismail, Shahid Munir Shah + Show 5 more

Open Access

PDF Available

https://doi.org/10.1186/s40537-021-00435-9

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometrics. In Pakistan, voice-based services are very common in banking and mobile/cellular sector, however, these services do not use voice features to recognize customers. Therefore, the chance to use these services with false identity is always high. It is essential to design a voice-based recognition system to minimize the risk of false identity. In this paper, we developed regional voice datasets for voice biometrics, by collecting voice data in different local accents of Pakistan. Although, there is a global need for voice biometrics especially when voice-based services are common, however, this paper uses Pakistan as a use case to show how to build regional voice dataset for voice biometrics. To build voice dataset, voice samples were recorded from 180 male and female speakers with two languages English and Urdu in form of five regional accents. Mel Frequency Cepstral Coefficient (MFCC) features were extracted from the collected voice samples to train Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF) and K-nearest neighbor (KNN) classifiers. The results indicate that ANN outperformed SVM, RF and KNN by achieving 88.53% and 86.58% recognition accuracy on both datasets respectively.

Highlights

For identification and verification, human body characteristics like voice, face, fingerprint, and gait etc. have been used since long ago [1]
The performance of Artificial Neural Network (ANN), Support Vector Machine (SVM), K-nearest neighbor (KNN) and Random Forest (RF) models was evaluated on the feature vectors obtained from the English voice dataset, whereas, in the second experiment, the performance of the same models was evaluated on the feature vectors obtained from the Urdu voice dataset
In this paper, the authors have designed voice datasets in Urdu and English languages with five different regional accents spoken in GB, located at the north of Pakistan

Summary

Introduction

Human body characteristics like voice, face, fingerprint, and gait etc. have been used since long ago [1]. Biometric identification is based on biometric traits, which broadly fall into two categories i.e. physiological biometric traits (fingerprint, face, iris, vein, ear, DNA, etc.) and behavioral biometric traits (voice, key strokes dynamics, signature, and gait etc.). In some situations, the fingerprint biometric trait is more desirable than the voice biometric trait In another situation, the voice biometric is preferable than finger print, such as access control for bank transactions via cell phones or landline telephones, voice mails and verification of credit cards, distant access to computers through a modem on the dial-up telephone line in call-centers and forensic applications where speaker recognition is required [4, 6]. The speaker identification task is to determine the specific speaker speaking from a speaker’s database In this task the unknown person does not claim identity and there must be 1:N comparisons. The cost of computation depends on the number of records in the voice database [8]

Methods

Results

Conclusion