Usage of Automatic Speech Recognition (ASR) systems is increasing day-by-day for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) devices. The necessity is also increasing to find out the ASR performance under different network impediments. Among them, speech and audio coding standards is the one, which affects the ASR performance greatly, when, using them with different sampling and bit rates in the practical systems. Another common impediment which influences the ASR accuracy is the bit errors in the wireless networks and packet drop conditions in the VoIP networks. ASR performance with some of the speech coding standards under noise conditions for the wireless networks is reported in the literature. However, each study is reporting the ASR performance for few narrowband codecs with different speech databases and different ASR toolkits like RAPHEL, HTK, SPHINX, etc. In this paper, the analysis on ASR performance while using both narrowband and wideband speech and audio coding standards, which are currently accepted for GSM mobile and VoIP networks, using the common speech database-TIMIT, and using ASR toolkit-SPHINX, is presented. The Mean Opinion Score (MOS), which is the generally accepted speech quality measurement technique, is also analyzed for all the speech and audio coding standards, using the same speech database. The results of the studies carried out for the ASR word accuracies and MOS values for different narrowband and wideband speech and audio codecs under no-loss conditions are presented. Results for different rates of packet drop condition which is the common noise scenario in wired networks such as VoIP (which is also merging with wireless networks) are also presented. The observation is that though some of the codecs are showing poor MOS performance at lower bit rates, the corresponding ASR performance is comparable with other codecs at higher bit rates.
Read full abstract