Document sentiment classification is an area of study that has been developed for decades. However, sentiment classification of Email data is rather a specialized field that has not yet been thoroughly studied. Compared to typical social media and review data, Email data has characteristics of length variance, duplication caused by reply and forward messages, and implicitness in sentiment indicators. Due to these characteristics, existing techniques are incapable of fully capturing the complex syntactic and relational structure among words and phrases in Email documents.In this study, we introduce a dependency graph-based position encoding technique enhanced with weighted sentiment features, and incorporate it into the feature representation process. We combine encoded sentiment sequence features with traditional word embedding features as input for a revised deep CNN model for Email sentiment classification. Experiments are conducted on three sets of real Email data with adequate label conversion processes. Empirical results indicate that our proposed SSE-CNN model obtained the highest accuracy rate of 88.6%, 74.3% and 82.1% for three experimental Email datasets over other comparative state-of-the-art algorithms. Furthermore, our performance evaluations on the preprocessing and sentiment sequence encoding justify the effectiveness of Email preprocessing and sentiment sequence encoding with dependency-graph based position and SWN features on the improvement of Email document sentiment classification.
Read full abstract