Abstract
In this paper, a feature fusion method with guiding training (FGT-Net) is constructed to fuse image data and numerical data for some specific recognition tasks which cannot be classified accurately only according to images. The proposed structure is divided into the shared weight network part, the feature fused layer part, and the classification layer part. First, the guided training method is proposed to optimize the training process, the representative images and training images are input into the shared weight network to learn the ability that extracts the image features better, and then the image features and numerical features are fused together in the feature fused layer to input into the classification layer for the classification task. Experiments are carried out to verify the effectiveness of the proposed model. Loss is calculated by the output of both the shared weight network and classification layer. The results of experiments show that the proposed FGT-Net achieves the accuracy of 87.8%, which is 15% higher than the CNN model of ShuffleNetv2 (which can process image data only) and 9.8% higher than the DNN method (which processes structured data only).
Highlights
In order to identify objects directly from images, researchers have proposed convolutional neural network (CNN), a deep learning model or multilayer perceptron which is like artificial neural networks, to regard each pixel of the image as a feature
The extracted dynamic differential entropy (DDE) features were classified by convolutional neural networks. erefore, here, we propose using auxiliary information to help further classification, such as weight and age, by fusing the features to distinguish objects in different images
The feature vectors generated by shared weight network layer can be guided to be closer to the feature vectors of the same class in the representative image set, and let feature vectors generated by the images belonging to the same class in the model closer. is is the main purpose of our proposed guided training. en, in order to solve the problem that only using images cannot correctly identify specific tasks, we propose feature fusion. e fusion feature vector Xf is obtained by fusing the feature vector Xt obtained from SWN2 and the feature vector Xe composed of additional numerical data
Summary
In order to identify objects directly from images, researchers have proposed convolutional neural network (CNN), a deep learning model or multilayer perceptron which is like artificial neural networks, to regard each pixel of the image as a feature. CNN is commonly used to analyze visual images. E first generation of CNN is LeNet [1], proposed by LeCun in 1998. After AlexNet, many excellent CNN models have appeared, and there are three main development directions: (a) deeper: the network layer is deeper, and the representative network is VggNet [4], ResNet [5]; (b) modularization: a modular network structure (Inception), the representative network is GoogleNet [6], Inceptionv2 [7], Inceptionv3 [8], and Inceptionv4 [9]; (c) faster: lightweight network model, for mobile devices, representative networks are SqueezeNet [10], MobileNet [11], ShuffleNet [12], MobileNetv2 [13], ShuffleNetv2 [14], and MobileNetv3 [15]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have