Abstract

The circuitry and pathways in the brains of humans and other species have long inspired researchers and system designers to develop accurate and efficient systems capable of solving real-world problems and responding in real-time. We propose the Syllable-Specific Temporal Encoding (SSTE) to learn vocal sequences in a reservoir of Izhikevich neurons, by forming associations between exclusive input activities and their corresponding syllables in the sequence. Our model converts the audio signals to cochleograms using the CAR-FAC model to simulate a brain-like auditory learning and memorization process. The reservoir is trained using a hardware-friendly approach to FORCE learning. Reservoir computing could yield associative memory dynamics with far less computational complexity compared to RNNs. The SSTE-based learning enables competent accuracy and stable recall of spatiotemporal sequences with fewer reservoir inputs compared with existing encodings in the literature for similar purpose, offering resource savings. The encoding points to syllable onsets and allows recalling from a desired point in the sequence, making it particularly suitable for recalling subsets of long vocal sequences. The SSTE demonstrates the capability of learning new signals without forgetting previously memorized sequences and displays robustness against occasional noise, a characteristic of real-world scenarios. The components of this model are configured to improve resource consumption and computational intensity, addressing some of the cost-efficiency issues that might arise in future implementations aiming for compactness and real-time, low-power operation. Overall, this model proposes a brain-inspired pattern generation network for vocal sequences that can be extended with other bio-inspired computations to explore their potentials for brain-like auditory perception. Future designs could inspire from this model to implement embedded devices that learn vocal sequences and recall them as needed in real-time. Such systems could acquire language and speech, operate as artificial assistants, and transcribe text to speech, in the presence of natural noise and corruption on audio data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.