Improving children speech recognition in acoustically mismatched condition using eigenvoices and feature projections

Hemant Kumar Kathania,Rohit Sinha,S Shahnawazuddin

doi:10.1109/ncc.2017.8077103

Abstract

The automatic recognition of children's speech in acoustically mismatched conditions is a challenging problem on account of large difference in adults' and children's speech. In literature, this challenge is often addressed through concatenation of various feature/model domain adaptation methods like vocal tract length normalization (VTLN), maximum likelihood linear regression (MLLR) and heteroscedastic linear discriminant analysis (HLDA). But a significant gap in the performance of adults and children still remains. This work explores the eigenvoices (EV) based adaptation for addressing the gap in recognition performance of children's speech on adults' speech trained acoustic models. EV is a fast adaptation approach and helps in an effective gender biasing of the acoustic models. On combining EV with VTLN, MLLR and HLDA, under mismatched condition an absolute improvement of about 50% over the unadapted speaker independent system performance is obtained and thus significantly reducing the gap between the performances for adults and children.

Full Text