Influence of noise on transfer learning in Chinese sentiment classification using GRU

Mingjun Dai,Shansong Huang,Chenguang Yang,Junpei Zhong,Shiwei Yang

doi:10.1109/fskd.2017.8393047

Abstract

Sentiment classification for product reviews is of great significance for business feedback for manufactures, sellers and users. However, since a large amount of training data for a specific product domain is not always available, transfer learning is often utilized to do sentiment analysis applications. Specifically, after a pre-training of the large Chinese corpus by a word-embedding method, a larger size of training data for a specific domain was trained using a Gated Recurrent Unit. And then the trained model was used for testing the sentiment classification for a smaller amount of product reviews. The performances of this transfer learning method was also examined, especially to testify different factors affecting the performance of the transfer learning. The experimental results showed that different wording in the review domain (which we call it “noise”) will have a greater impact on transfer learning. We also calculate the difference of the wording to verify our hypothesis. According to these results, we have explored the impacts of the dataset wording, while we are doing Chinese text sentiment classification. We also shed a light in optimizing the transfer learning effect in general.

Full Text