A small Brazilian speech corpus was created for educational purposes to study a state-of-the-art speaker recognition system. The system uses the Gaussian Mixture Model (GMM) as a statistical model for speakers and employs the Mel-frequency cepstral coefficients (MFCC) as acoustic features. The results using clean and noisy speech are compatible with the expected results, showing that the bigger the mismatch between training and test conditions, the worse the results. The results also improve with the increase in the utterance length. Finally, the obtained results can be used as baselines to compare with other speaker statistical models created with different acoustic features in different acoustic conditions.
Read full abstract