Field test evaluations and optimization of speaker independent speech recognition for telephone applications

Christian Gagnoulet,Christel Sorin

doi:10.3115/112405.112429

Abstract

This paper presents, in a first part, the detailed results of several field evaluations of the CNET speaker independent speech recognition system in a context of 2 voice-activated servers accessible by the general French public over the telephone. The analysis of roughly 11 000 user's tokens indicates that the rejection of incorrect input is a major problem and that the gap between the recognition rates observed in real use conditions and in the most realistic laboratory tests remains very large.The second part of the paper describes the current improvements of the system: better rejection procedures, enhancement of the recognition performances resulting from both the introduction of field data in the training data and the increase of the number of parameters, automatic adjustments of the HMM topology allowing to either reduce overall model complexity or improve recognition performance. Tested on long distance telephone databases (450 to 750 speakers), the current version of the CNET recognition system yields a laboratory error rate of 0.7% on the 10 French digits and of 0.95% on a 36 word vocabulary.

Full Text