Abstract
Emotion is strongly subjective, and different parts of the image may have a different degree of impact on emotion. The key for solving image emotion recognition is to effectively mine different discriminative local regions. We present a deep architecture to guide the network to extract discriminative and diverse affective semantic information. First, training a full convolutional network with a cross-channel max pooling strategy (CCMP) to extract discriminative feature maps. Second, to ensure that most of the discriminative sentiment regions are located accurately, we add a module consisting of the convolution layer and the CCMP. After obtaining the discriminative regions of the first module, the feature elements corresponding to the discriminant regions are erased, and then the erased features are fed into the second module. Such adversarial erasure operation can force the network to discover different sentiment discriminative regions. Third, an adaptive feature fusion mechanism is proposed to better integrate discriminative and diverse sentiment representations. Sufficient experiments are conducted on the benchmark datasets FI, EmotionROI, Instagram, and Twitter1 to achieve 72.17%, 61.13%, 81.97%, and 85.44% recognition accuracies, respectively. The results of the experiment demonstrate that the proposed network outperforms the state-of-the-art results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.