Achievement of the fusion for different modalities is a critical issue for multimodal emotion recognition. Feature-level fusion methods cannot deal with missing or corrupted data, while decision-level fusion methods may lose the correlation information between different modalities. To solve the above problems, a hierarchy modular neural network (HMNN) is proposed and is applied for multimodal emotion recognition. First, an HMNN is constructed to mimic the hierarchy modular architecture as demonstrated in the human brain. Each module contains several submodules dealing with features from different modalities. Connections are built between submodules within the same module and between corresponding submodules from different modules. Then, a learning algorithm based on Hebbian learning is used to train the connection weights in HMNN, which simulates the learning mechanism of the human brain. HMNN recognizes the label based on the activity level of each module and adopts the winner-take-all strategy. Finally, the proposed HMNN is applied on a public dataset for multimodal emotion recognition. Experimental results show that the proposed HMNN improves the recognition results, when compared with other decision-fusion methods, including support vector machine, as well as neural networks such as back-propagation and radial basis function neural networks. Furthermore, the inter-submodule connections in one module realizes information integration from different modalities and improves the performance of HMNN. Besides, the experiments suggest the effectiveness of HMNN on dealing with missing/corrupted data.
Read full abstract