The difference in fundamental frequency (F0) between vowels is a segregation cue for identifying concurrent vowels. Younger listeners with normal hearing (YNH) showed that the percent identification score of both vowels improved with increasing F0 difference and asymptoted at higher F0 difference. The current study developed a deep-neural-network model consisting of a time-delay neural network and multitask learning (TDNN-MTL) to predict the concurrent vowel scores. The input to the TDNN-MTL were temporal responses of concurrent vowels obtained using the auditory-nerve model. The TDNN-MTL was trained at a 3-Hz F0 difference until the score (80%) became closer to the YNH score (85%). The TDNN-MTL learned the formant coding and aided in the segregation of concurrent vowels into dominant and recessive vowels. The total weighted loss was tailored to mimic the dominant-recessive relationship of the YNH scores. The TDNN-MTL was tested against the other five F0 differences. The TDNN-MTL model scores of both vowels were successful in predicting the YNH scores. The chi-square tests revealed that the TDNN-MTL model scores correlated well with the YNH scores compared with other F0-segregation and multi-layer perceptron models. The TDNN-MTL also correctly predicted the one vowel identification scores and F0 benefit. These findings suggest that the trained TDNN-MTL accurately predicts the formant coding of the concurrent vowels across F0 differences, which aided in capturing the YNH scores. The TDNN-MTL can be extended to validate the behavioral studies on concurrent vowels obtained across acoustic (e.g., vowel duration and level) and auditory changes (e.g., aging and hearing loss).
Read full abstract