CNN RNN Research Articles

With the proliferation of the online video publishing, the number of multimodal contents on the Internet has exponentially grown. Research of emotion analysis has developed from the traditional single-mode to complex multimode analysis. Most recent studies have paid rare attention to the visual emotion information deriving from merging visual and audio emotional information at the feature or decision level, even though some of them considered the multimodality analysis. In this paper, we extract visual, textual, and audio information from video and propose a multimodal emotional classification framework to capture the emotions of users in social networks. We have designed a 3DCLS (3D Convolutional-Long Short Term Memory) hybrid model that classifies visual emotions as well as a CNN–RNN hybrid model that classifies text-based emotions. Finally, visual, audio and text modes are combined to generate final emotional classification results. Experiments on the MOUD and IEMOCAP emotion datasets show that the proposed framework outperforms existing models in multimodal mood analysis.

Recognizing multi-label images is a significant but challenging task toward high-level visual understanding. Remarkable success has been achieved by applying CNN–RNN design-based models to capture the underlying semantic dependencies of labels and predict the label distributions over the global-level features output by CNNs. However, such global-level features often fuse the information of multiple objects, leading to the difficulty in recognizing small object and capturing the label co-relation. To better solve this problem, in this paper, we propose a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern. By introducing the attention network module in the CNN–RNN architecture, the objects features of the attention map are separated by the channels which are further send to the LSTM network to capture dependencies and predict labels sequentially. A category-wise max-pooling operation is then performed to integrate these labels into the final prediction. Experimental results on PASCAL2007 and MS-COCO datasets demonstrate that our model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.

CNN RNN Research Articles

Related Topics

Articles published on CNN RNN

An Energy-Efficient Deep Reinforcement Learning Accelerator With Transposable PE Array and Experience Compression

A dual CNN–RNN for multiple people tracking

A social emotion classification approach using multi-model fusion

CARF-Net: CNN attention and RNN fusion network for video-based person reidentification

Multi-label image classification with recurrently learning semantic dependencies

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

CNN RNN Research Articles

Related Topics

Articles published on CNN RNN

An Energy-Efficient Deep Reinforcement Learning Accelerator With Transposable PE Array and Experience Compression

A dual CNN–RNN for multiple people tracking

A social emotion classification approach using multi-model fusion

CARF-Net: CNN attention and RNN fusion network for video-based person reidentification

Multi-label image classification with recurrently learning semantic dependencies