Supervised Text Classification Research Articles

BackgroundThe massive scale of social media platforms requires an automatic solution for detecting hate speech. These automatic solutions will help reduce the need for manual analysis of content. Most previous literature has cast the hate speech detection problem as a supervised text classification task using classical machine learning methods or, more recently, deep learning methods. However, work investigating this problem in Arabic cyberspace is still limited compared to the published work on English text.ObjectiveThis study aims to identify hate speech related to the COVID-19 pandemic posted by Twitter users in the Arab region and to discover the main issues discussed in tweets containing hate speech.MethodsWe used the ArCOV-19 dataset, an ongoing collection of Arabic tweets related to COVID-19, starting from January 27, 2020. Tweets were analyzed for hate speech using a pretrained convolutional neural network (CNN) model; each tweet was given a score between 0 and 1, with 1 being the most hateful text. We also used nonnegative matrix factorization to discover the main issues and topics discussed in hate tweets.ResultsThe analysis of hate speech in Twitter data in the Arab region identified that the number of non–hate tweets greatly exceeded the number of hate tweets, where the percentage of hate tweets among COVID-19 related tweets was 3.2% (11,743/547,554). The analysis also revealed that the majority of hate tweets (8385/11,743, 71.4%) contained a low level of hate based on the score provided by the CNN. This study identified Saudi Arabia as the Arab country from which the most COVID-19 hate tweets originated during the pandemic. Furthermore, we showed that the largest number of hate tweets appeared during the time period of March 1-30, 2020, representing 51.9% of all hate tweets (6095/11,743). Contrary to what was anticipated, in the Arab region, it was found that the spread of COVID-19–related hate speech on Twitter was weakly related with the dissemination of the pandemic based on the Pearson correlation coefficient (r=0.1982, P=.50). The study also identified the commonly discussed topics in hate tweets during the pandemic. Analysis of the 7 extracted topics showed that 6 of the 7 identified topics were related to hate speech against China and Iran. Arab users also discussed topics related to political conflicts in the Arab region during the COVID-19 pandemic.ConclusionsThe COVID-19 pandemic poses serious public health challenges to nations worldwide. During the COVID-19 pandemic, frequent use of social media can contribute to the spread of hate speech. Hate speech on the web can have a negative impact on society, and hate speech may have a direct correlation with real hate crimes, which increases the threat associated with being targeted by hate speech and abusive language. This study is the first to analyze hate speech in the context of Arabic COVID-19–related tweets in the Arab region.

Read full abstract

BackgroundPatient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources.ObjectiveThe primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials.MethodsThis approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval.ResultsWe provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one.ConclusionsText from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they typically receive, as our results suggest that at most a third of breast cancer survivors’ questions would be addressed by the materials currently provided to them.

Read full abstract

Supervised Text Classification Research Articles

Related Topics

Articles published on Supervised Text Classification

Hoax Analyzer for Indonesian News Using Deep Learning Models

Sentiment Analysis for Software Engineering Domain in Turkish

Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.

Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering

A study of deep learning methods for same-genre and cross-genre author profiling

Repression Technology: Internet Accessibility and State Violence

A framework for understanding citizens’ political participation in social media

Type of Supervised Text Classification System for Unstructured Text Comments using Probability Theory Technique

Assessing Unmet Information Needs of Breast Cancer Survivors: Exploratory Study of Online Health Forums Using Text Classification and Retrieval.

TREMO: A dataset for emotion analysis in Turkish

Online suicide prevention through optimised text classification

Text classification method based on self-training and LDA topic models

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Unsupervised Identification of Translationese

Dataless Text Classification with Descriptive LDA

A text processing pipeline to extract recommendations from radiology reports

Text Classification Method for Data Cleaning

Partially Supervised Text Classification with Multi-Level Examples

Thematic content analysis using supervised machine learning: An empirical evaluation using German online news

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Supervised Text Classification Research Articles

Related Topics

Articles published on Supervised Text Classification

Hoax Analyzer for Indonesian News Using Deep Learning Models

Sentiment Analysis for Software Engineering Domain in Turkish

Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.

Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering

A study of deep learning methods for same-genre and cross-genre author profiling

Repression Technology: Internet Accessibility and State Violence

A framework for understanding citizens’ political participation in social media

Type of Supervised Text Classification System for Unstructured Text Comments using Probability Theory Technique

Assessing Unmet Information Needs of Breast Cancer Survivors: Exploratory Study of Online Health Forums Using Text Classification and Retrieval.

TREMO: A dataset for emotion analysis in Turkish

Online suicide prevention through optimised text classification

Text classification method based on self-training and LDA topic models

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Unsupervised Identification of Translationese

Dataless Text Classification with Descriptive LDA

A text processing pipeline to extract recommendations from radiology reports

Text Classification Method for Data Cleaning

Partially Supervised Text Classification with Multi-Level Examples

Thematic content analysis using supervised machine learning: An empirical evaluation using German online news