Abstract

This paper proposes an approach based on compressed sensing to reduce the footprint of speech corpus in unit selection based speech synthesis (USS) systems. It exploits the observation that speech signal can have a sparse representation (in suitable choice of basis functions) and can be estimated effectively using the sparse coding framework. Thus, only few significant coefficients of the sparse vector needed to be stored instead of entire speech signal. During synthesis, speech signal can be reconstructed (with less error) using these significant coefficients only. Furthermore, the number of significant coefficients can be chosen adaptively based on type of segment such as voiced or unvoiced. Simulation results suggest that the proposed compression method effectively preserves most of the spectral information and can be used as an alternative to existing compression methods used in USS systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call