Speaker recognition has multiple real-life applications. The purpose of this study is to determine the feasibility of classifying samples of human speech, specifically Spanish speakers, based on their distinctive accents. In this work, Mel-Frequency Cepstral Coefficients (MFCC) combined with machine learning techniques were used to identify the nationality of Spanish-speaking individuals through voice recordings obtained from the Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech corpus. Data preprocessing was performed by extracting 50 MFCC from each recording, which formed the dataset for experimentation. Experiments were conducted with different subsets, and the best results were obtained with individuals from four Latin American countries, including both males and females. Neural networks were employed for the classification stage, achieving an accuracy of 99.84%.
Read full abstract