Abstract

Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared in speaker-dependent, speaker-independent (or cross-speaker) recognition. The perceptually based linear predictive (PLP) front-end, with the root-power sums (RPS) distance measure, yields generally the highest accuracies, especially in cross-speaker recognition., It is experimentally shown that one can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measure's sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features. For a digit vocabulary and five reference templates obtained with a clustering algorithm, the optimization improves recognition accuracy from 97% to 98.1%, with respect to the PL-PRPS front-end.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call