Multimodal Taste Classification of Chinese Recipe Based on Image and Text Fusion

Chen Yawei,Cao Min,Gao Wenjing

doi:10.1109/icsgea51094.2020.00050

Abstract

It is difficult for taste classification of Chinese recipe to achieve satisfactory results based on single-modal data. However, there are few studies on multimodal analysis in this field. In this paper, we put forward to tackle taste classification for Chinese recipe based on image and text fusion algorithms. Firstly, visual features and textual features are extracted from different models, including a convolutional neural network (CNN) constructed for visual feature extraction and a pretrained word2vec model combined with a multi-layer perception network for textual feature extraction. Secondly, two fusion strategies, called feature-level and decision-level fusion, are designed to perform multimodal fusion for the final taste prediction. Several experiments are carried out with K-fold cross-validation to verify the effectiveness of our proposed model. The results show that the multimodal fusion model for taste classification is superior to those based on single-modal features. Besides, compared with feature-level fusion, decision-level fusion performs better in the task of taste classification for Chinese recipe.

Full Text