Abstract

A Karhunen‐Loeve (KL) series expansion was used to block encode speech spectral principal components as a function of time. Each of ten principal components was first obtained as a linear combination of 20 speech spectral band energies. Using a fixed block length of ten frames (0.128 s), the KL basis vectors were computed separately for various speakers for each principal component. However, the optimal KL basis vector set was essentially the same for each principal component and for the different speakers. The basis vector set also closely resembled a cosine basis vector set. Approximately 94% of the variance of the principal components was accounted for by five (out of ten) basis vectors. Speech was synthesized using the KL basis vectors for block encoding of ten‐frame blocks of principal components. Informal listening tests indicate that very little information is lost using five basis vectors. These results indicate that speech spectral principal components, particularly the low‐ordered ones which ref...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call