Abstract

Social media text written in Chinese communities contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and some minor text using Latin letters, an alphabet-based writing system. This phenomenon is called writing systems changes (WSCs). Past studies have shown that WSCs can be used to express emotions, particularly where the social and political environment is more conservative. However, because WSCs can break the syntax of the major text, it poses more challenges in Natural Language Processing (NLP) tasks like emotion classification. In this work, we present a novel deep learning based method to include WSCs as an effective feature for emotion analysis. The method first identifies all WSCs points. Then representation of the major text is learned through an LSTM model whereas the minor text is learned by a separate CNN model. Emotions in the minor text are further highlighted through an attention mechanism before emotion classification. Performance evaluation shows that incorporating WSCs features using deep learning models can improve performance measured by F1-scores compared to the state-of-the-art model.

Highlights

  • Emotion analysis has been studied using different Natural Language Processing (NLP) methods from a variety of linguistic perspectives such as semantic, syntactic, and cognitive properties (Barbosa and Feng, 2010; Balamurali et al, 2011; Liu and Zhang, 2012; Wilson et al, 2013; Joshi and Itkat, 2014; Long et al, 2017)

  • This paper presents our work in progress which uses a novel deep learning based method to incorporate textual features associated with writing systems changes (WSCs) via an attention mechanism

  • This paper presents a work in progress of an Hybrid Attention Network (HAN) model based on an Long-Short Term Memory (LSTM) model for emotion analysis in the context of WSCs in social media

Read more

Summary

Introduction

Emotion analysis has been studied using different NLP methods from a variety of linguistic perspectives such as semantic, syntactic, and cognitive properties (Barbosa and Feng, 2010; Balamurali et al, 2011; Liu and Zhang, 2012; Wilson et al, 2013; Joshi and Itkat, 2014; Long et al, 2017) In many areas, such as Hong Kong and the Chinese Mainland, social media text is often written in mixed text with major text written in Chinese characters, an ideograph-based writing system. WSCs can break the syntax of the major text and the switched minor text lacks linguistic cues in this type of social media data (Dos Santos and Gatti, 2014) This makes feature engineering-based methods difficult to work. The attention mechanism is achieved by projecting the major text representation into attention vectors while aggregating the representation of the informative words from WSCs context

The Hybrid Attention Network Model
Performance Evaluation
Dataset and Statistics
Analysis of WSCs Linked to Emotions
Effects of Different Types of Text
Findings
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.