Abstract

Classification of phonemes is the process of assigning a phonetic category to a short section of speech signal. It is a key stage in various applications such as Spoken Term Detection, continuous speech recognition and music to lyrics synchronization, but it can also be useful on its own, for example in the professional music industry, and for applications for the hearing impaired. In this study we present an effective algorithm for classification of one group of phonemes, namely the unvoiced fricatives, which are characterized by a relatively large amount of spectral energy in the high frequency range. The classification between individual phonemes within this group is fairly difficult due to the fact that their acoustic-phonetic characteristics are quite similar. A three-stage classification algorithm between the unvoiced fricatives is utilized. In the first, preprocessing stage, each phoneme segment is divided into consecutive non-overlapping short windowed frames, which is represented by a 15-dimensional feature vector. In the second stage a support vector machine (SVM) is trained, using radial basis kernel function and an automatic grid search for optimizing the SVM parameter. A tree-based algorithm is used in the classification stage, where the phonemes are first classified into two subgroups according to their articulation: sibilants (/s/ and /sh/) and the nonsibilants (/f/ and /th/). Each subgroup is further classified using another SVM. For the evaluation of the performance of the algorithm we used more than 11000 phonemes extracted from the TIMIT speech database. Using a majority vote for the feature vectors of the-same phoneme, the overall accuracy of 85% is obtained (91% for the subset /s/, /sh/ and /f/). These results are comparable and somewhat better than those achieved in other studies. The efficiency and robustness of the algorithm make it implementable in real time applications for the hearing impaired or in recording studios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call