Abstract

Children's speech processing is more challenging than that of adults due to lacking of large scale children's speech corpora. With the developing of the physical speech organ, high inter speaker and intra speaker variabilities are observed in children's speech. On the other hand, data collection on children is difficult as children usually have short attention span and their language proficiency is limited. In this paper, we propose to improve children's automatic speech recognition performance with transfer learning technique. We compare two transfer learning approaches in enhancing children's speech recognition performance with adults' data. The first method is to perform acoustic model adaptation on the pre-trained adult model. The second is to train acoustic model with deep neural network based multi-task learning approach: the adults' and children's acoustic characteristics are learnt jointly in the shared hidden layers, while the output layers are optimized with different speaker groups. Our experiment results show that both transfer learning approaches are effective in transferring rich phonetic and acoustic information from adults' model to children model. The multi-task learning approach outperforms the acoustic adaptation approach. We further show that the speakers' acoustic characteristics in languages can also benefit the target language under the multi-task learning framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call