Abstract
Affective analysis of social media text is in great demand. Online text written in Chinese communities often contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and minor text using Latin letters, an alphabet-based writing system. This phenomenon is referred to as writing systems changes (WSCs). Past studies have shown that WSCs often reflect unfiltered immediate affections. However, the use of WSCs poses more challenges in Natural Language Processing tasks because WSCs can break the syntax of the major text. In this work, we present our work to use WSCs as an effective feature in a hybrid deep learning model with attention network. The WSCs scripts are first identified by their encoding range. Then, the document representation of the text is learned through a Long Short-Term Memory model and the minor text is learned by a separate Convolution Neural Network model. To further highlight the WSCs components, an attention mechanism is adopted to re-weight the feature vector before the classification layer. Experiments show that the proposed hybrid deep learning method which better incorporates WSCs features can further improve performance compared to the state-of-the-art classification models. The experimental result indicates that WSCs can serve as effective information in affective analysis of the social media text.
Highlights
In social media, text is becoming increasingly important due to its effectiveness in disseminating information in highly individualized and opinionated context
This paper presents a hybrid deep learning model with attention network for affective analysis in the context of writing system changes
We argue that Writing Systems Changes (WSCs) text is potentially informative and a proper learning model needs to be designed such that additional information can be captured in deep learning based models for emotion classification
Summary
Text is becoming increasingly important due to its effectiveness in disseminating information in highly individualized and opinionated context. The minor text can be written in English (as shown in E1), Pinyin (phonetic denotation for Chinese) (as shown in E2 in short form), or other new Internet notations with Roman characters using some Latin-based writing system as well as other symbolic expressions, e.g. emoji symbols as shown in E3. This phenomenon of using mixed scripts in different writing systems is known as Writing Systems Changes (WSCs). The alternation between different writing systems is relatively common in real-time platforms like micro-blog in China This feature offers reliable clues for affective analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Machine Learning and Cybernetics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.