Sentiment analysis using image-based deep spectrum features

Shahin Amiriparian,Sandra Ottl,Bjorn Schuller,Nicholas Cummins,Maurice Gerczuk

doi:10.1109/aciiw.2017.8272618

Abstract

We test the suitability of our novel deep spectrum feature representation for performing speech-based sentiment analysis. Deep spectrum features are formed by passing spectrograms through a pre-trained image convolutional neural network (CNN) and have been shown to capture useful emotion information in speech; however, their usefulness for sentiment analysis is yet to be investigated. Using a data set of movie reviews collected from YouTube, we compare deep spectrum features combined with the bag-of-audio-words (BoAW) paradigm with a state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) based BoAW system when performing a binary sentiment classification task. Key results presented indicate the suitability of both features for the proposed task. The deep spectrum features achieve an unweighted average recall of 74.5 %. The results provide further evidence for the effectiveness of deep spectrum features as a robust feature representation for speech analysis.

Full Text