This study investigates the performance of machine learning classifiers in the domain of speaker identification, a pivotal component of modern digital security systems. With the burgeoning integration of voice-activated interfaces in technology, the demand for accurate and reliable speaker identification is paramount. This research provides a comprehensive comparison of four widely used classifiers: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Decision Tree (DT). Utilizing the LibriSpeech dataset, known for its diversity of speakers and recording conditions, we extracted Mel-frequency cepstral coefficients (MFCCs) to serve as features for training and evaluating the classifiers. Each model's performance was assessed based on precision, recall, F1-score, and accuracy. The results revealed that RF outperformed all other classifiers, achieving near-perfect metrics, indicative of its robustness and generalizability for speaker identification tasks. KNN also demonstrated high performance, suggesting its suitability for applications where rapid execution and interpretability are critical. Conversely, SVM and DT, while yielding moderate and lower performances respectively, highlighted the necessity for further optimization. These findings underscore the effectiveness of ensemble and distance-based classifiers in handling complex patterns for speaker differentiation. The study not only guides the selection of appropriate classifiers for speaker identification but also sets the stage for future research, which could explore hybrid models and the impact of dataset variability on performance. The insights from this analysis contribute significantly to the field, providing a benchmark for developing advanced speaker identification systems
Read full abstract