Unlabeled Tweets Research Articles

BackgroundMore than 6 million people in the United States have Alzheimer disease and related dementias, receiving help from more than 11 million family or other informal caregivers. A range of traditional interventions has been developed to support family caregivers; however, most of them have not been implemented in practice and remain largely inaccessible. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter for interventions.ObjectiveThe objective of this study is to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia.MethodsBetween May 4 and May 20, 2021, we collected 10,733 tweets, posted by 8846 users, that mention a dementia-related keyword, a linguistic marker that potentially indicates a diagnosis, and a select familial relationship. Three annotators annotated 1 random tweet per user to distinguish those that indicate having a family member with dementia from those that do not. Interannotator agreement was 0.82 (Fleiss kappa). We used the annotated tweets to train and evaluate support vector machine and deep neural network classifiers. To assess the scalability of our approach, we then deployed automatic classification on unlabeled tweets that were continuously collected between May 4, 2021, and March 9, 2022.ResultsA deep neural network classifier based on a BERT (bidirectional encoder representations from transformers) model pretrained on tweets achieved the highest F1-score of 0.962 (precision=0.946 and recall=0.979) for the class of tweets indicating that the user has a family member with dementia. The classifier detected 128,838 tweets that indicate having a family member with dementia, posted by 74,290 users between May 4, 2021, and March 9, 2022—that is, approximately 7500 users per month.ConclusionsOur annotated data set can be used to automatically identify Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers’ experiences but also directly target interventions at these users.

Read full abstract

BackgroundIn the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone.ObjectiveThe objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in the United States that are not based on testing and, thus, may not have been reported to the Centers for Disease Control and Prevention.MethodsBeginning January 23, 2020, we collected English tweets from the Twitter Streaming application programming interface that mention keywords related to COVID-19. We applied handwritten regular expressions to identify tweets indicating that the user potentially has been exposed to COVID-19. We automatically filtered out “reported speech” (eg, quotations, news headlines) from the tweets that matched the regular expressions, and two annotators annotated a random sample of 8976 tweets that are geo-tagged or have profile location metadata, distinguishing tweets that self-report potential cases of COVID-19 from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on bidirectional encoder representations from transformers (BERT). Finally, we deployed the automatic pipeline on more than 85 million unlabeled tweets that were continuously collected between March 1 and August 21, 2020.ResultsInterannotator agreement, based on dual annotations for 3644 (41%) of the 8976 tweets, was 0.77 (Cohen κ). A deep neural network classifier, based on a BERT model that was pretrained on tweets related to COVID-19, achieved an F1-score of 0.76 (precision=0.76, recall=0.76) for detecting tweets that self-report potential cases of COVID-19. Upon deploying our automatic pipeline, we identified 13,714 tweets that self-report potential cases of COVID-19 and have US state–level geolocations.ConclusionsWe have made the 13,714 tweets identified in this study, along with each tweet’s time stamp and US state–level geolocation, publicly available to download. This data set presents the opportunity for future work to assess the utility of Twitter data as a complementary resource for tracking the spread of COVID-19.

Read full abstract

Unlabeled Tweets Research Articles

Related Topics

Articles published on Unlabeled Tweets

ACOVMD: Automatic COVID‐19 misinformation detection in Twitter using self‐trained semi‐supervised hybrid deep learning model

Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect

Automatically Identifying Twitter Users for Interventions to Support Dementia Family Caregivers: Annotated Data Set and Benchmark Classification Models.

Weakly labeled data augmentation for social media named entity recognition

A Novel Annotation Scheme to Generate Hate Speech Corpus through Crowdsourcing and Active Learning

Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter

Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization.

COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets

An enhanced cosine-based visual technique for the robust tweets data clustering

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.

Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter

Towards countering hate speech against journalists on social media

An Active Learning Based Emoji Prediction Method in Turkish

Personalized spoiler detection in tweets by using support vector machine

Towards scaling Twitter for digital epidemiology of birth defects

Prediction of Personal Experience Tweets of Medication Use via Contextual Word Representations.

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

Cross-Modality Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning

Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Enhancing Semantic Role Labeling for Tweets Using Self-Training

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unlabeled Tweets Research Articles

Related Topics

Articles published on Unlabeled Tweets

ACOVMD: Automatic COVID‐19 misinformation detection in Twitter using self‐trained semi‐supervised hybrid deep learning model

Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect

Automatically Identifying Twitter Users for Interventions to Support Dementia Family Caregivers: Annotated Data Set and Benchmark Classification Models.

Weakly labeled data augmentation for social media named entity recognition

A Novel Annotation Scheme to Generate Hate Speech Corpus through Crowdsourcing and Active Learning

Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter

Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization.

COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets

An enhanced cosine-based visual technique for the robust tweets data clustering

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.

Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter

Towards countering hate speech against journalists on social media

An Active Learning Based Emoji Prediction Method in Turkish

Personalized spoiler detection in tweets by using support vector machine

Towards scaling Twitter for digital epidemiology of birth defects

Prediction of Personal Experience Tweets of Medication Use via Contextual Word Representations.

Unsupervised collective-based framework for dynamic retraining of supervised real-time spam tweets detection model

Cross-Modality Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning

Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Enhancing Semantic Role Labeling for Tweets Using Self-Training