General Modeling Language Research Articles

Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes. We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM. The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant. LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.

Read full abstract

Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges. This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions. A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions. The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses. Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.

Read full abstract

General Modeling Language Research Articles

Related Topics

Articles published on General Modeling Language

Parameterization before Meta-Analysis: Cross-Modal Embedding Clustering for Forest Ecology Question-Answering

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model

Nursing Education in the Era of ChatGPT: Implications and Opportunities

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments.

Enhancing Disease Detection in Electronic Medical Records: Integrating Human Expertise and Large Language Models with Application to Diabetes, Hypertension, and Acute Myocardial Infarction

Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

AI2 – Where Appreciative Inquiry and Artificial Intelligence Meet

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

Zero- and few-shot prompting of generative large language models provides weak assessment of risk of bias in clinical trials.

GPT-Based Model for Concise Summaries of Patient Health Information.

Exploring Offline Large Language Models for Clinical Information Extraction: A Study of Renal Histopathological Reports of Lupus Nephritis Patients.

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.

Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.

De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials.

Teaming Up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT

Disability Ethics and Education in the Age of Artificial Intelligence: Identifying Ability Bias in ChatGPT and Gemini

Email subjects generation with large language models: GPT-3.5, PaLM 2, and BERT

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

General Modeling Language Research Articles

Related Topics

Articles published on General Modeling Language

Parameterization before Meta-Analysis: Cross-Modal Embedding Clustering for Forest Ecology Question-Answering

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model

Nursing Education in the Era of ChatGPT: Implications and Opportunities

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments.

Enhancing Disease Detection in Electronic Medical Records: Integrating Human Expertise and Large Language Models with Application to Diabetes, Hypertension, and Acute Myocardial Infarction

Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

AI2 – Where Appreciative Inquiry and Artificial Intelligence Meet

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

Zero- and few-shot prompting of generative large language models provides weak assessment of risk of bias in clinical trials.

GPT-Based Model for Concise Summaries of Patient Health Information.

Exploring Offline Large Language Models for Clinical Information Extraction: A Study of Renal Histopathological Reports of Lupus Nephritis Patients.

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.

Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.

De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials.

Teaming Up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT

Disability Ethics and Education in the Age of Artificial Intelligence: Identifying Ability Bias in ChatGPT and Gemini

Email subjects generation with large language models: GPT-3.5, PaLM 2, and BERT