Abstract

In this paper, we have employed learned dictionaries to compute sparse representation of speech utterances, which will be used to reduce the footprint of unit selection based speech synthesis (USS) systems. Speech database labeled at phoneme level is used to obtain multiple examples of the same phoneme, and all the examples (of each phoneme) are then used to learn a single overcomplete dictionary for the same phoneme. Two dictionary learning algorithms namely KSVD (K-singular value decomposition) and GAD (greedy adaptive dictionary) are employed to obtain respective sparse representations. The learned dictionaries are then used to compute the sparse vector for all the speech units corresponding to a speech utterance. Significant coefficients (along with their index locations) of the sparse vector and the learned dictionaries are stored instead of entire speech utterance. During synthesis, the speech waveform is synthesized using the significant coefficients of sparse vector and the corresponding dictionary. Experimental results demonstrate that the quality of the synthesized speech is better using the proposed approach while it achieves comparable compression to the existing compression methods employed in the USS systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call