Advances In Natural Language Processing Research Articles

Background With advancements in natural language processing, tools such as Chat Generative Pre-Trained Transformers (ChatGPT) version 4.0 and Google Bard's Gemini Advanced are being increasingly evaluated for their potential in various medical applications. The objective of this study was to systematically assess the performance of these language learning models (LLMs) on both image and non-image-based questions within the specialized field of Ophthalmology. We used a review question bank for the Ophthalmic Knowledge Assessment Program (OKAP) used by ophthalmology residents nationally to prepare for the Ophthalmology Board Exam to assess the accuracy and performance of ChatGPT and Gemini Advanced. Methodology A total of 260 randomly generated multiple-choice questions from the OphthoQuestions question bank were run through ChatGPT and Gemini Advanced. A simulated 260-question OKAP examination was created at random from the bank. Question-specific data were analyzed, including overall percent correct, subspecialty accuracy, whether the question was "high yield," difficulty (1-4), and question type (e.g., image, text). To compare the performance of ChatGPT-4 and Gemini on the difficulty of questions, we utilized the standard deviation of user answer choices to determine question difficulty. In this study, a statistical analysis of Google Sheets was conducted using two-tailed t-tests with unequal variance to compare the performance of ChatGPT-4.0 and Google's Gemini Advanced across various question types, subspecialties, and difficulty levels. Results In total, 259 of the 260 questions were included in the study as one question used a video that any form of ChatGPT could not interpret as of May 1, 2024. For text-only questions, ChatGPT-4.0.0 correctly answered 57.14% (148/259, p < 0.018), and Gemini Advanced correctly answered 46.72% (121/259, p < 0.018). Both versions answered most questions without a prompt and would have received a below-average score on the OKAP. Moreover, there were 27 questions utilizing a secondary prompt in ChatGPT-4.0 compared to 67 questions in Gemini Advanced. ChatGPT-4.0 performed 68.99% on easier questions (<2 on a scale from 1-4) and 44.96% on harder questions (>2 on a scale from 1-4). On the other hand, Gemini Advanced performed 49.61% on easier questions (<2 on a scale from 1-4) and 44.19% on harder questions (>2 on a scale from 1-4). There was a statistically significant difference in accuracy between ChatGPT-4.0 and Gemini Advanced for easy (p < 0.0015) but not for hard (p < 0.55) questions. For image-only questions, ChatGPT-4.0 correctly answered 39.58% (19/48, p < 0.013), and Gemini Advanced correctly answered 33.33% (16/48, p < 0.022), resulting in a statistically insignificant difference in accuracy between ChatGPT-4.0 and Gemini Advanced (p < 0.530). A comparison against text-only and image-based questions demonstrated a statistically significant difference in accuracy for both ChatGPT-4.0 (p < 0.013) and Gemini Advanced (p < 0.022). Conclusions This study provides evidence that ChatGPT-4.0 performs better on the OKAP-style exams and is improved compared to Gemini Advanced within the context of ophthalmic multiple-choice questions. This may show an opportunity for increased worth for ChatGPT in ophthalmic medical education. While showing promise within medical education, caution should be used as a more detailed evaluation of reliability is needed.

Read full abstract

Psychosis poses substantial social and healthcare burdens. The analysis of speech is a promising approach for the diagnosis and monitoring of psychosis, capturing symptoms like thought disorder and flattened affect. Recent advancements in Natural Language Processing (NLP) methodologies enable the automated extraction of informative speech features, which has been leveraged for early psychosis detection and assessment of symptomology. However, critical gaps persist, including the absence of standardized sample collection protocols, small sample sizes, and a lack of multi-illness classification, limiting clinical applicability. Our study aimed to (1) identify an optimal assessment approach for the online and remote collection of speech, in the context of assessing the psychosis spectrum and evaluate whether a fully automated, speech-based machine learning (ML) pipeline can discriminate among different conditions on the schizophrenia-bipolar spectrum (SSD-BD-SPE), help-seeking comparison subjects (MDD), and healthy controls (HC) at varying layers of analysis and diagnostic complexity. We adopted online data collection methods to collect 20 minutes of speech and demographic information from individuals. Participants were categorized as "healthy" help-seekers (HC), having a schizophrenia-spectrum disorder (SSD), bipolar disorder (BD), major depressive disorder (MDD), or being on the psychosis spectrum with sub-clinical psychotic experiences (SPE). SPE status was determined based on self-reported clinical diagnosis and responses to the PHQ-8 and PQ-16 screening questionnaires, while other diagnoses were determined based on self-report from participants. Linguistic and paralinguistic features were extracted and ensemble learning algorithms (e.g., XGBoost) were used to train models. A 70%-30% train-test split and 30-fold cross-validation was used to validate the model performance. The final analysis sample included 1140 individuals and 22,650 minutes of speech. Using 5-minutes of speech, our model could discriminate between HC and those with a serious mental illness (SSD or BD) with 86% accuracy (AUC = 0.91, Recall = 0.7, Precision = 0.98). Furthermore, our model could discern among HC, SPE, BD and SSD groups with 86% accuracy (F1 macro = 0.855, Recall Macro = 0.86, Precision Macro = 0.86). Finally, in a 5-class discrimination task including individuals with MDD, our model had 76% accuracy (F1 macro = 0.757, Recall Macro = 0.758, Precision Macro = 0.766). Our ML pipeline demonstrated disorder-specific learning, achieving excellent or good accuracy across several classification tasks. We demonstrated that the screening of mental disorders is possible via a fully automated, remote speech assessment pipeline. We tested our model on relatively high number conditions (5 classes) in the literature and in a stratified sample of psychosis spectrum, including HC, SPE, SSD and BD (4 classes). We tested our model on a large sample (N = 1150) and demonstrated best-in-class accuracy with remotely collected speech data in the psychosis spectrum, however, further clinical validation is needed to test the reliability of model performance.

Read full abstract

Advances In Natural Language Processing Research Articles

Related Topics

Articles published on Advances In Natural Language Processing

AI-powered language learning: The role of NLP in grammar, spelling, and pronunciation feedback

Multi-label text classification via secondary use of large clinical real-world data sets.

Advancements in Natural Language Processing for Human-Computer Interaction

MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction

Advancements in Natural Language Processing: Implications, Challenges, and Future Directions

Addressing the risks of generative AI for the judiciary: The accountability framework(s) under the EU AI Act

Advancements in natural language processing: Bridging the gap between human and machine communication

Exploring Recent NLP Advances for Tamil: Word Vectors and Hybrid Deep Learning Architectures

Joint intent detection and slot filling with syntactic and semantic features using multichannel CNN-BiLSTM

Large language models and their applications in bioinformatics

Need of retrieval augmented generation for conversational assistants for government social welfare scheme popularization

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

Decomposing maintenance actions into sub-tasks using natural language processing: A case study in an Italian automotive company

Analysing Psychological Sentiment Prediction Across Modalities: Harnessing Emotion Datasets within Natural Language Processing (NLP)

Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

CODEX

Soil Organic Carbon Estimation via Remote Sensing and Machine Learning Techniques: Global Topic Modeling and Research Trend Exploration

Automated Speech Analysis in Bipolar Disorder: The CALIBER Study Protocol and Preliminary Results.

Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Advances In Natural Language Processing Research Articles

Related Topics

Articles published on Advances In Natural Language Processing

AI-powered language learning: The role of NLP in grammar, spelling, and pronunciation feedback

Multi-label text classification via secondary use of large clinical real-world data sets.

Advancements in Natural Language Processing for Human-Computer Interaction

MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction

Advancements in Natural Language Processing: Implications, Challenges, and Future Directions

Addressing the risks of generative AI for the judiciary: The accountability framework(s) under the EU AI Act

Advancements in natural language processing: Bridging the gap between human and machine communication

Exploring Recent NLP Advances for Tamil: Word Vectors and Hybrid Deep Learning Architectures

Joint intent detection and slot filling with syntactic and semantic features using multichannel CNN-BiLSTM

Large language models and their applications in bioinformatics

Need of retrieval augmented generation for conversational assistants for government social welfare scheme popularization

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

Decomposing maintenance actions into sub-tasks using natural language processing: A case study in an Italian automotive company

Analysing Psychological Sentiment Prediction Across Modalities: Harnessing Emotion Datasets within Natural Language Processing (NLP)

Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

CODEX

Soil Organic Carbon Estimation via Remote Sensing and Machine Learning Techniques: Global Topic Modeling and Research Trend Exploration

Automated Speech Analysis in Bipolar Disorder: The CALIBER Study Protocol and Preliminary Results.

Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions.