Abstract

We applied several types of time-delay neural networks (TDNNs), generally used for speaker-dependent and multispeaker speech recognition, to speaker-independent speech recognition and compared their performance. Six or 12 speakers were used to train each network, and recognition experiments for voiced stops /b, d, g/ were performed in open speaker mode. The best recognition rates were 91.3 percent and 93.6 percent, using six and 12 training speakers, respectively. We found that constructing modular networks, such as modular TDNN with each network corresponding to a speaker, is effective in terms of decreasing the number of training iterations needed, showing slightly better performance than with a single TDNN with a comparable network capacity. This is because the modular networks make use of limited capacity effectively. On the other hand, a single TDNN with an increased number of hidden units showed a recognition rate comparable to that of the modular TDNN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.