A deep multiple-instance text binary classification for topic relevant content extraction on social media

Juan Yin,Xiaoyang Liu,Zhewen Yang

doi:10.1016/j.jksuci.2023.101883

Juan Yin, Xiaoyang Liu + Show 1 more

Open Access

https://doi.org/10.1016/j.jksuci.2023.101883

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Social media platforms have rich text data, which can be used in data mining and analysis. However, given the fact, the evolution speed of natural languages is rapid in social media, and data on social media is very noisy. This is a great challenge to the accuracy of data analysis. To overcome this problem, we propose a topic-relevant content extraction (TRCE) based on deep multiple instance classification, leveraging existing information and hierarchical relationships among texts under a thread on social media as weak supervision to extract topic-strong-relevant data and filter out noise accurately without manually labeling data. The proposed method introduces latent variables, Bernoulli distribution, and variational inference into multiple-instance learning (MIL) to generate pseudo labels. Then we employ a dual-stream neural network with a 3-stage training process to achieve training MIL end-to-end. Experimental results show TRCE has a significant improvement compared with other MIL methods. Meanwhile, it only has a little decrease compared with supervised text classification on accuracy and F1 score. Given the fact TRCE does not need manually labeled data at all, while supervised classification relies heavily on labeled data, TRCE is a competitive method to extracting topic-relevant data and filtering out noise on social media.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A deep multiple-instance text binary classification for topic relevant content extraction on social media

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Dec 29, 2023
License type: cc-by-nc-nd

Similar Papers

Going Viral: The 3 Rs of Social Media Messaging during Public Health Emergencies.
Bhavini Patel Murthy ... Sara J Vagi
Health security | VOL. 19
Bhavini Patel Murthy, et. al.Bhavini Patel Murthy ... Sara J Vagi
01 Feb 2021
Health security | VOL. 19

Cross-attention-based saliency inference for predicting cancer metastasis on whole slide images.
Ziyu Su ... Muhammad Khalid Khan Niazi
IEEE journal of biomedical and health informatics | VOL. PP
Ziyu Su, et. al.Ziyu Su ... Muhammad Khalid Khan Niazi
01 Dec 2024
IEEE journal of biomedical and health informatics | VOL. PP

ReMix: A General and Efficient Framework for Multiple Instance Learning Based Whole Slide Image Classification
Jiawei Yang ... Lei He
-
Jiawei Yang, et. al.Jiawei Yang ... Lei He
01 Jan 2021
01 Jan 2021

Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research
Jean Burgess ... Axel Bruns
M/C Journal | VOL. 15
Jean Burgess, et. al.Jean Burgess ... Axel Bruns
11 Oct 2012
M/C Journal | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A deep multiple-instance text binary classification for topic relevant content extraction on social media

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences