Abstract

Many studies from speech science have shown that the mel frequency scale more closely matches speech perception than the linear frequency scale. Automatic speech recognition engineers have empirically demonstrated that the use of the mel scale results in more accurate speech recognition than that obtainable with features computed with respect to a linear frequency scale. The features most typically used for automatic speech recognition are mel frequency cepstral coefficients (MFCCs), along with delta and acceleration terms that represent the temporal evolution of MFCC over very short time intervals. However, the MFCC features do not encode the better temporal resolution that is possible at higher frequencies with low‐frequency resolution. In this paper, a two‐dimensional feature set is presented that incorporates good frequency resolution and low‐time resolution at low‐frequencies, and low frequency resolution and good time resolution at high frequencies. These features are computed from overlapping block...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call