Abstract
To reduce the negative impacts of rumors on the real world, rumor detection on social networks has practical significance. Currently, the research on Chinese rumor detection is relatively comprehensive, but Cantonese rumors are much less investigated. As a main dialect of Chinese, Cantonese has more than 60 million speakers globally, but there are some great challenges in the Cantonese rumor detection. Firstly, there is no available benchmark dataset of Cantonese rumors. Secondly, it is a significant challenge to learn the unique linguistic characteristics of Cantonese. Thirdly, traditional rumor detection approaches cannot be directly applied to Cantonese rumors. Therefore, we propose a novel framework for Cantonese rumor detection using deep neural networks with feature fusion. To the best of our knowledge, it is the first study conducted on Cantonese rumor detection on social networks. Specifically, we build a Cantonese rumor dataset and a multi-domain Cantonese corpus. Next, a total of 27 statistical features are extracted and seven of them are newly proposed. Then, a novel deep learning model called BLA is designed to identify Cantonese rumors, which generates text and Jyutping embeddings using a further pre-trained BERT model and a CNN model. Moreover, the BLA model integrates the statistical and semantic features to implement the classification of Cantonese rumors. Experiments demonstrate that the BLA model achieves a remarkable Cantonese rumor detection performance with an F1 Score of 0.9225.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.