Abstract

AbstractBackgroundSpeaker identification on datasets collected in the wild faces several challenges, including the scarcity of labelled data, noisy data and data where several people are talking. Previous works have focused on speaker identification using large amounts of labelled data or semi‐supervised approaches where a small amount of data is used a training set, but in this case, the datasets used only contain clean data and often the same sentences repeated several times by each participant.MethodIn this work, we propose a new approach to achieve fuzzy semi‐supervised clustering on Alexa data collected in households of people living with dementia (PLwD). In this context we wish to identify the carer and the PLwD, but also visitors. We extract x‐vectors from the audio data, apply PCA and use those as input features for our algorithm. The clusters are initialised with a small amount of data from each identified speaker (carer, PLwD). New data is added iteratively and membership scores are computed for each cluster. Membership is computed according to the Euclidian distance between the data point and each cluster. Centroids are updated every 30 iterations, only considering the points with high membership. Membership is also updated with the new centroids and considering the Mahalanolis distance.ResultOur method performs comparatively to unsupervised learning and supervised learning in a two‐class problem (carer and PLwD), with 100% accuracy and 96% recall in identifying the PLwD with a training set of 5 files. When considering visitors as an unknown class, unsupervised learning fails. Our method performs similarly to supervised learning in identifying the carer and PLwD (97% accuracy and 80% recall), but is better at identifying visitors as unknown speakers than supervised learning (around 40% accuracy, and 4% respectively).ConclusionIn this work, we propose a new method to perform speaker identification on data collected in the wild. We endeavour to address the shortcomings of supervised learning which tends to overfit and unsupervised learning which does not perform well with unknown classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call