Text Analysis Program Research Articles

Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT's underlying language model a serious concern. Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human-use sessions (N=1400 total letters). We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular "male" and "female" names in the United States. Study 1 used 3 different letter-writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between-prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender-stereotyped professional accomplishments, ChatGPT would evidence gender-based language differences replicating those found in systematic reviews of human-written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill-based language for male names). Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt-raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. ChatGPT reproduces many gender-based language biases that have been reliably identified in investigations of human-written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real-world scenarios. OSF Registries osf.io/ztv96; https://osf.io/ztv96.

Read full abstract

Objective:Cognitive tests requiring spoken responses, such as paragraph recall, are rich in cognitive-related information that is not captured using traditional scoring methods. This study aimed to determine if linguistic features embedded in spoken responses may differentiate between individuals who are and are not cognitively impaired.Participants and Methods:Participants in the Long Life Family Study completed a neuropsychological assessment which included the WMS-R Logical Memory I paragraph recall. For a subset of participants (N=709), test responses were digitally recorded and manually transcribed. We used Linguistic Inquiry Word Count, a text analysis program, to quantify word counts, grammatical features (e.g, prepositions, verb tenses), and the use of content words related to specific semantic categories (e.g., work-related, numbers) for immediate (IR) and delayed recall (DR). We used regression models with Generalized Estimating Equations adjusted by age, sex, education, and within-family correlation to select features associated with cognitive status (normal cognition [NC] versus cognitive impairment [CI]; Bonferroni-corrected threshold p<0.001). Next, we developed a “polyfeature score” (PFS) for both immediate and delayed recall, each calculated as a weighted sum of the selected linguistic features. We then built a logistic regression model to evaluate the predictive value of each PFS for identifying cognitively impaired individuals. In secondary analyses, we used regression models as above to identify features associated with mild cognitive impairment subtype (amnestic [aMCI] versus nonamnestic [naMCI]; threshold p< .05).Results:The sample included 599 participants with NC and 110 with CI (mean age = 72.3 ± 11.0 years, 54% female). The regression identified 8 linguistic features for IR and 7 for DR that significantly predicted cognitive status. Decreased use of content words related to work (e.g., employed, school, police) and biological processes (e.g., cook, cafeteria, eat) and the use of negations (e.g., no, not, can’t) were predictive of cognitive impairment in both recall conditions. In contrast, the use of other content word categories were predictive of cognitive status in only one recall condition (IR: leisure, cognitive processes, space; DR: drives, number). The use of fewer prepositions in IR, more first-person pronouns in DR, and fewer words in the past tense in DR were each associated with cognitive impairment. Word count was not predictive of cognitive status. Both PFSs were highly associated with cognitive status (PFS_IR ß= 0.74, p< 0.001; PFS_DR ß= 0.86, p= 0.001) with high discriminative value (PFS_IR AUC= 0.93, sensitivity = 0.81, specificity= 0.91; PFS_DR AUC= 0.95, sensitivity= 0.77, specificity= 0.88). In the CI subset, linguistic features differed between those classified as aMCI (n= 24) and naMCI (n= 40). Two function word categories predicted aMCI in IR whereas decreased word count, two function word categories, and two content word categories predicted aMCI in DR (all p< .05)Conclusions:Linguistic features from paragraph recall provide high predictive value for classifying cognitive status increasing its potential as a cognitive screener in clinical settings. Additionally, each recall condition identified unique linguistic features associated with cognitive impairment which may aid differentiation of cognitive impairment subtypes and elucidate processes underlying deficits in learning and recall.

Read full abstract

Text Analysis Program Research Articles

Related Topics

Articles published on Text Analysis Program

Automated Assessment of Leadership Style and Professional Values of a Top-Manager and its Compliance with SDGS

Examining the relationship between parents' self-reported mindfulness and observed language use in attachment-relevant communication.

Trends in Influencer Marketing: A Bibliometric Analysis and Future Directions

Letter of Recommendation Characteristics Associated with Interview Offer to a Vascular Surgery Residency Program

What's in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT.

Automated text analysis methods to identify the individual structure of motivation for sports and a healthy lifestyle

95 Delving Beyond the Test Score: Linguistic Markers of Cognitive Impairment on Paragraph Recall

Analyzing Leadership Messaging Styles and Team Performance in the NFL: Insights from Post-First Loss Press Conferences

Linguistic Differences by Gender in Letters of Recommendation for Maternal-Fetal Medicine Fellowship Applicants.

Detecting ulterior motives from verbal cues in group deliberations.

Analyzing the Extent to which Gender Bias Exists in News Articles Using Natural Language Processing Techniques

Digital ethics challengers in Russian and English media texts: Migrant discourse case study

Framing Requests for Help: Language and Gender in Parental Advice-Seeking Letters

Goal language is associated with attrition and weight loss on a digital program: Observational study.

Linguistic Differences in Letters of Recommendation for Maternal Fetal Medicine Fellowship Applicants [A49

Cognitive-Affective Styles of Biden and Trump Supporters: An Automated Text Analysis Study

Linguistic Differences by Gender in Letters of Recommendation for Minimally Invasive Gynecologic Surgery Fellowship Applicants

Cybersecurity Risk in U.S. Critical Infrastructure: An Analysis of Publicly Available U.S. Government Alerts and Advisories

Technologies of deep text analysis in musicology

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Analysis Program Research Articles

Related Topics

Articles published on Text Analysis Program

Automated Assessment of Leadership Style and Professional Values of a Top-Manager and its Compliance with SDGS

Examining the relationship between parents' self-reported mindfulness and observed language use in attachment-relevant communication.

Trends in Influencer Marketing: A Bibliometric Analysis and Future Directions

Letter of Recommendation Characteristics Associated with Interview Offer to a Vascular Surgery Residency Program

What's in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT.

Automated text analysis methods to identify the individual structure of motivation for sports and a healthy lifestyle

95 Delving Beyond the Test Score: Linguistic Markers of Cognitive Impairment on Paragraph Recall

Analyzing Leadership Messaging Styles and Team Performance in the NFL: Insights from Post-First Loss Press Conferences

Linguistic Differences by Gender in Letters of Recommendation for Maternal-Fetal Medicine Fellowship Applicants.

Detecting ulterior motives from verbal cues in group deliberations.

Analyzing the Extent to which Gender Bias Exists in News Articles Using Natural Language Processing Techniques

Digital ethics challengers in Russian and English media texts: Migrant discourse case study

Framing Requests for Help: Language and Gender in Parental Advice-Seeking Letters

Goal language is associated with attrition and weight loss on a digital program: Observational study.

Linguistic Differences in Letters of Recommendation for Maternal Fetal Medicine Fellowship Applicants [A49

Cognitive-Affective Styles of Biden and Trump Supporters: An Automated Text Analysis Study

Linguistic Differences by Gender in Letters of Recommendation for Minimally Invasive Gynecologic Surgery Fellowship Applicants

Cybersecurity Risk in U.S. Critical Infrastructure: An Analysis of Publicly Available U.S. Government Alerts and Advisories

Technologies of deep text analysis in musicology