The paper proposes a method of visual attention-based emotion classification through eye gaze analysis. Concretely, tensor-based emotional category classification via visual attention-based heterogeneous convolutional neural network (CNN) feature fusion is proposed. Based on the relationship between human emotions and changes in visual attention with time, the proposed method performs new gaze-based image representation that is suitable for reflecting the characteristics of the changes in visual attention with time. Furthermore, since emotions evoked in humans are closely related to objects in images, our method uses a CNN model to obtain CNN features that can represent their characteristics. For improving the representation ability to the emotional categories, we extract multiple CNN features from our novel gaze-based image representation and enable their fusion by constructing a novel tensor consisting of these CNN features. Thus, this tensor construction realizes the visual attention-based heterogeneous CNN feature fusion. This is the main contribution of this paper. Finally, by applying logistic tensor regression with general tensor discriminant analysis to the newly constructed tensor, the emotional category classification becomes feasible. Since experimental results show that the proposed method enables the emotional category classification with the F1-measure of approximately 0.6, and about 10% improvement can be realized compared to comparative methods including state-of-the-art methods, the effectiveness of the proposed method is verified.