Abstract

The new multi-view learning algorithm is proposed by modifying an existing method, the multimodal convolutional neural networks originally developed for image-text matching (modified m-CNN), to use not only images but also texts for classification. Firstly, pre-trained CNN and word embedding models are applied to extract visual features and represent each word in a text as a vector, respectively. Secondly, textual features are extracted by employing a CNN model for text data. Finally, pairs of features extracted through the text and image CNNs are concatenated and input to convolutional layer which can obtain a better learn of the important feature information in the integrated representation of image and text. Features extracted from the convolutional layer are input to a fully connected layer to perform classification. Experimental results demonstrate that the proposed algorithm can obtain superior performance compared with other data fusion methods for flower classification using data of images of flowers and their Korean descriptions. More specifically, the accuracy of the proposed algorithm is 10.1% and 14.5% higher than m-CNN and multimodal recurrent neural networks algorithms, respectively. The proposed method can significantly improve the performance of flower classification. The code and related data are publicly available via our GitHub repository.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call