Large Text Corpora Research Articles

Smartphone-based apps are increasingly used to prevent relapse among those with substance use disorders (SUDs). These systems collect a wealth of data from participants, including the content of messages exchanged in peer-to-peer support forums. How individuals self-disclose and exchange social support in these forums may provide insight into their recovery course, but a manual review of a large corpus of text by human coders is inefficient. The study sought to evaluate the feasibility of applying supervised machine learning (ML) to perform large-scale content analysis of an online peer-to-peer discussion forum. Machine-coded data were also used to understand how communication styles relate to writers' substance use and well-being outcomes. Data were collected from a smartphone app that connects patients with SUDs to online peer support via a discussion forum. Overall, 268 adult patients with SUD diagnoses were recruited from 3 federally qualified health centers in the United States beginning in 2014. Two waves of survey data were collected to measure demographic characteristics and study outcomes: at baseline (before accessing the app) and after 6 months of using the app. Messages were downloaded from the peer-to-peer forum and subjected to manual content analysis. These data were used to train supervised ML algorithms using features extracted from the Linguistic Inquiry and Word Count (LIWC) system to automatically identify the types of expression relevant to peer-to-peer support. Regression analyses examined how each expression type was associated with recovery outcomes. Our manual content analysis identified 7 expression types relevant to the recovery process (emotional support, informational support, negative affect, change talk, insightful disclosure, gratitude, and universality disclosure). Over 6 months of app use, 86.2% (231/268) of participants posted on the app's support forum. Of these participants, 93.5% (216/231) posted at least 1 message in the content categories of interest, generating 10,503 messages. Supervised ML algorithms were trained on the hand-coded data, achieving F1-scores ranging from 0.57 to 0.85. Regression analyses revealed that a greater proportion of the messages giving emotional support to peers was related to reduced substance use. For self-disclosure, a greater proportion of the messages expressing universality was related to improved quality of life, whereas a greater proportion of the negative affect expressions was negatively related to quality of life and mood. This study highlights a method of natural language processing with potential to provide real-time insights into peer-to-peer communication dynamics. First, we found that our ML approach allowed for large-scale content coding while retaining moderate-to-high levels of accuracy. Second, individuals' expression styles were associated with recovery outcomes. The expression types of emotional support, universality disclosure, and negative affect were significantly related to recovery outcomes, and attending to these dynamics may be important for appropriate intervention.

Read full abstract

The ability to automatically detect anxiety disorders from speech could be useful as a screening tool for an anxiety disorder. Prior studies have shown that individual words in textual transcripts of speech have an association with anxiety severity. Transformer-based neural networks are models that have been recently shown to have powerful predictive capabilities based on the context of more than one input word. Transformers detect linguistic patterns and can be separately trained to make specific predictions based on these patterns. This study aimed to determine whether a transformer-based language model can be used to screen for generalized anxiety disorder from impromptu speech transcripts. A total of 2000 participants provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test (TSST). They also completed the Generalized Anxiety Disorder 7-item (GAD-7) scale. A transformer-based neural network model (pretrained on large textual corpora) was fine-tuned on the speech transcripts and the GAD-7 to predict whether a participant was above or below a screening threshold of the GAD-7. We reported the area under the receiver operating characteristic curve (AUROC) on the test data and compared the results with a baseline logistic regression model using the Linguistic Inquiry and Word Count (LIWC) features as input. Using the integrated gradient method to determine specific words that strongly affect the predictions, we inferred specific linguistic patterns that influence the predictions. The baseline LIWC-based logistic regression model had an AUROC value of 0.58. The fine-tuned transformer model achieved an AUROC value of 0.64. Specific words that were often implicated in the predictions were also dependent on the context. For example, the first-person singular pronoun "I" influenced toward an anxious prediction 88% of the time and a nonanxious prediction 12% of the time, depending on the context. Silent pauses in speech, also often implicated in predictions, influenced toward an anxious prediction 20% of the time and a nonanxious prediction 80% of the time. There is evidence that a transformer-based neural network model has increased predictive power compared with the single word-based LIWC model. We also showed that the use of specific words in a specific context-a linguistic pattern-is part of the reason for the better prediction. This suggests that such transformer-based models could play a useful role in anxiety screening systems.

Read full abstract

Large Text Corpora Research Articles

Related Topics

Articles published on Large Text Corpora

Comprehension and production of Kinyarwanda verbs in the Discriminative Lexicon

Media Representations of Healthcare Robotics in Norway 2000-2020: A Topic Modeling Approach

Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries.

Identification of social scientifically relevant topics in an interview repository: a natural language processing experiment

Welcome to the University of life, can I take your order? Investigating Life Experience Degree Offerings in Diploma mills

Using Machine Learning of Online Expression to Explain Recovery Trajectories: Content Analytic Approach to Studying a Substance Use Disorder Forum.

The (moral) language of hate.

An overview of the consumer‐centric disruptive technology research: Insights from topic modelling and literature review

Annotation uncertainty in the context of grammatical change

Keyword/ Keyphrase Extraction from Text of Indian Election Domain

Regina Coeli—Doctrine and Iconography of the Virgin Mary’s Heavenly Royalty

Exploiting Structure in Regular Expression Queries

Algorithms propagate gender bias in the marketplace—with consumers’ cooperation

San-Eng: Sanskrit to English Translator using Machine Learning

The impact of big data on research methods in information science

Automated Approach for Digitalizing Scope of Work Requirements to Support Contract Management

Predicting Generalized Anxiety Disorder From Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation Study.

Review of Natural Language Processing in Pharmacology.

An Informed Neural Network for Discovering Historical Documentation Assisting the Repatriation of Indigenous Ancestral Human Remains

Universal versus system-specific features of punctuation usage patterns in major Western languages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Text Corpora Research Articles

Related Topics

Articles published on Large Text Corpora

Comprehension and production of Kinyarwanda verbs in the Discriminative Lexicon

Media Representations of Healthcare Robotics in Norway 2000-2020: A Topic Modeling Approach

Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries.

Identification of social scientifically relevant topics in an interview repository: a natural language processing experiment

Welcome to the University of life, can I take your order? Investigating Life Experience Degree Offerings in Diploma mills

Using Machine Learning of Online Expression to Explain Recovery Trajectories: Content Analytic Approach to Studying a Substance Use Disorder Forum.

The (moral) language of hate.

An overview of the consumer‐centric disruptive technology research: Insights from topic modelling and literature review

Annotation uncertainty in the context of grammatical change

Keyword/ Keyphrase Extraction from Text of Indian Election Domain

Regina Coeli—Doctrine and Iconography of the Virgin Mary’s Heavenly Royalty

Exploiting Structure in Regular Expression Queries

Algorithms propagate gender bias in the marketplace—with consumers’ cooperation

San-Eng: Sanskrit to English Translator using Machine Learning

The impact of big data on research methods in information science

Automated Approach for Digitalizing Scope of Work Requirements to Support Contract Management

Predicting Generalized Anxiety Disorder From Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation Study.

Review of Natural Language Processing in Pharmacology.

An Informed Neural Network for Discovering Historical Documentation Assisting the Repatriation of Indigenous Ancestral Human Remains

Universal versus system-specific features of punctuation usage patterns in major Western languages