Abstract

Both in speech and singing, humans are capable of generating sounds of different phonation types (e.g., breathy, modal and pressed). Previous studies in the analysis and classification of phonation types have mainly used voice source features derived using glottal inverse filtering (GIF). Even though glottal source features are useful in discriminating phonation types in speech, their performance deteriorates in singing voice due to the high fundamental frequency of these sounds that reduces the accuracy of source-filter separation in GIF. In the present study, features describing the glottal source were computed using three signal processing methods that do not compute source-filter separation. These three methods are zero frequency filtering (ZFF), zero time windowing (ZTW) and single frequency filtering (SFF). From each method, a group of scalar features were extracted. In addition, cepstral coefficients were derived from the spectra computed using ZTW and SFF. Experiments were conducted with the proposed features to analyse and classify phonation types using three phonation types (breathy, modal and pressed) for speech and singing voice. Statistical pair-wise comparisons between the phonation types showed that most of the features were capable of separating the phonation types significantly for speech and singing voices. Classification with support vector machine classifiers indicated that the proposed features and their combinations showed improved accuracy compared to usually employed glottal source features and mel-frequency cepstral coefficients (MFCCs).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call