Emergent spatio-temporal multimodal learning using a developmental network

Dongshu Wang,Jianbin Xin

doi:10.1007/s10489-018-1337-5

Abstract

Conventional machine learning needs humans to train each module with hand-handcrafted data and symbols manually, and the results of these methods are confined to particular tasks. To address this limitation, in this paper we design a multimodal autonomous learning architecture based on a developmental network for the audio and vision co-development. The developmental network is a biological inspired mechanism, which can make an agent to develop and integrate audition and vision simultaneously. Furthermore, synapse maintenance is introduced in the vision information learning to enhance the video recognition rate and neuron regenesis mechanism is implemented to enhance the network usage efficiency. In the experiments, a number of fundamental words are acquired and identified using the proposed learning methodology without any prior knowledge about the objects or the verbal questions before running. The experiments show that the proposed learning method can achieve significantly high recognition rates in comparison with the state-of-the-art method.

Full Text