Development of context dependent sequential K-nearest neighbor classifier for usable speech classification

J K Shah,B Y Smolenski,R E Yantorno,A N Iyer

doi:10.1121/1.1669313

Abstract

The accuracy of speech processing applications degrades when operating in co-channel environment. Co-channel speech occurs when more than one person is talking at the same time. The idea of usable speech segmentation is to identify and extract those portions of co-channel speech that are minimally degraded but still useful for speech processing applications (such as speaker identification or speech recognition) which do not work in co-channel environments. Usable speech measures are features that are extracted from the co-channel signal to distinguish between usable and unusable speech. Several usable speech extraction methods have recently been developed based on a single feature of the speech signal being considered. In this paper, however, a new usable speech extraction technique, which sequentially and contextually selects several features of the given signal using the K-nearest neighbor classifier, is being investigated. This new approach considers periodicity and structure based features simultaneously in order to achieve the maximum classification rate, and by observing all the incoming frames, avoids the problem of deciding the amount of data needed to make accurate decisions. A 100% accuracy can be achieved in speech processing applications by using this extracted usable speech segment.

Full Text