Abstract

A listening test is proposed in which human participants detect talker changes in two natural, multi-talker speech stimuli sets-a familiar language (English) and an unfamiliar language (Chinese). Miss rate, false-alarm rate, and response times (RT) showed a significant dependence on language familiarity. Linear regression modeling of RTs using diverse acoustic features derived from the stimuli showed recruitment of a pool of acoustic features for the talker change detection task. Further, benchmarking the same task against the state-of-the-art machine diarization system showed that the machine system achieves human parity for the familiar language but not for the unfamiliar language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call