Abstract

This article presents research performed into the topic of speaker segmentation for meeting rooms. It looks into the algorithms and implementation of offline speaker segmentation and clustering system for a meeting recording where usually single distant microphone is available. The aim of this work is to improve the performance and practicality of speaker segmentation technology in the meeting recording using clustering algorithm. Speaker identification mainly consists of three main parts. It first performs speech/non-speech detection, followed by extraction of speaker information using Mel Frequency Cepstral Coefficient (MFCC) features. Here we observe that MFCC features give better performance and recognize the speakers effectively. Speaker segmentation using MFCC features results in average speaker segmentation accuracy of 78%. However slight confusions where segments of some speakers are assigned wrong labels still exist. These cases are dealt with the help of clustering algorithm. Observation of neighbouring segment sequences is used to replace that segment label by the nearest speaker label. This improves the overall segmentation accuracy to 96%. The deviation from 100% accuracy can be attributed by two primary reasons, viz. boundary condition of speaker transition and segments of speakers with major confusions where decision of “label to be replaced with” is difficult.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call