Abstract

This study examines the role of frequencies above 8 kHz in the classification of conversational speech fricatives [f, v, θ, ð, s, z, ʃ, ʒ, h] in random forest modeling. Prior research has mostly focused on spectral measures for fricative categorization using frequency information below 8 kHz. The contribution of higher frequencies has received only limited attention, especially for non-laboratory speech. In the present study, we use a corpus of sociolinguistic interview recordings from Western Canadian English sampled at 44.1 and 16 kHz. For both sampling rates, we analyze spectral measures obtained using Fourier analysis and the multitaper method, and we also compare models without and with amplitudinal measures. Results show that while frequency information above 8 kHz does not improve classification accuracy in random forest analyses, inclusion of such frequencies can affect the relative importance of specific measures. This includes a decreased contribution of center of gravity and an increased contribution of spectral standard deviation for the higher sampling rate. We also find no major differences in classification accuracy between Fourier and multitaper measures. The inclusion of power measures improves model accuracy but does not change the overall importance of spectral measures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call