Abstract

• Control of composite tasks by adaptive heuristically enhanced gradient approximation. • Training an artificial neural network (ANN) to control the parameters of a speech synthesizer. • The training error is estimated from the “quality” of the produced speech and not at the outputs of the ANN. • Currently used ANN training algorithms cannot cope with the above requirement. • The presented methodology may be applied to other applications too, e.g., coordination of robotic manipulators. This work is inspired by the ability of neural systems to control the behavior of specialized actuator mechanisms in living organisms by monitoring the end-effect of their actions. We consider as an example of such an actuator mechanism the human vocal tract where neurons learn to activate its muscles that move the velum, jaw, tongue and the lips, in order to exhibit desired phonetic activity. As a technical approximation to such a setup, we use an artificial neural network (ANN) and a speech synthesizer and we study the capability of the ANN to estimate the synthesizer’s parameters targeting desired speech activity. In this setup, we assume that the training error is obtained by measuring the “perceived distance” between the original (target) and the synthesized speech signals. Thus, the training error needs to be measured after processing the output of the speech synthesizer, instead of measuring it directly at the outputs of the ANN. This operational requirement on error measurement restricts the application of widely used ANN training algorithms that are based on back propagation of gradients but can be met by our earlier proposed “Heuristically Enhanced Gradient Approximation” (HEGA) algorithm. We also propose enhancements to HEGA that further optimize its performance in this demanding application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call