Large Language Models for Psychological Assessment: A Comprehensive Overview

  • Abstract
  • Literature Map
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Large language models (LLMs) are extraordinary tools demonstrating potential to improve the understanding of psychological characteristics. They provide an unprecedented opportunity to supplement self-report in psychology research and practice with scalable behavioral assessment. However, they also pose unique risks and challenges. In this article, we provide an overview and guide for psychological scientists to evaluate LLMs for psychological assessment. In the first section, we briefly review the development of transformer-based LLMs and discuss their advances in natural language processing. In the second section, we describe the experimental design process, including techniques for language data collection, audio processing and transcription, text preprocessing, and model selection, and analytic matters, such as model output, model evaluation, hyperparameter tuning, model visualization, and topic modeling. At each stage, we describe options, important decisions, and resources for further in-depth learning and provide examples from different areas of psychology. In the final section, we discuss important broader ethical and implementation issues and future directions for researchers using this methodology. The reader will develop an understanding of essential ideas and an ability to navigate the process of using LLMs for psychological assessment.

ReferencesShowing 10 of 77 papers
  • Cite Count Icon 98
  • 10.1037/h0060838
Speech and personality.
  • Jan 1, 1942
  • Psychological Bulletin
  • F H Sanford

  • Open Access Icon
  • Cite Count Icon 9
  • 10.31234/osf.io/9h7aw
Semantic embeddings reveal and address taxonomic incommensurability in psychological measurement
  • Oct 12, 2023
  • Dirk U Wulff + 1 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 8
  • 10.18653/v1/2022.emnlp-main.159
Towards Climate Awareness in NLP Research
  • Jan 1, 2022
  • Daniel Hershcovich + 4 more

  • Cite Count Icon 15
  • 10.18653/v1/w19-4409
Regression or classification? Automated Essay Scoring for Norwegian
  • Jan 1, 2019
  • Stig Johan Berggren + 2 more

  • Open Access Icon
  • Cite Count Icon 9606
  • 10.1037/h0040957
Construct validity in psychological tests.
  • Jul 1, 1955
  • Psychological Bulletin
  • Lee J Cronbach + 1 more

  • Cite Count Icon 9
  • 10.1016/j.csl.2024.101663
Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection
  • May 11, 2024
  • Computer Speech & Language
  • Julia Ohse + 8 more

  • Cite Count Icon 143
  • 10.1126/science.adg8538
Illusory generalizability of clinical prediction models.
  • Jan 12, 2024
  • Science (New York, N.Y.)
  • Adam M Chekroud + 11 more

  • Cite Count Icon 1227
  • 10.1037/0003-066x.56.2.128
Psychological testing and psychological assessment: A review of evidence and issues.
  • Feb 1, 2001
  • American Psychologist
  • Gregory J Meyer + 8 more

  • Cite Count Icon 2
  • 10.1007/978-981-97-5672-8_40
Locating and Mitigating Gender Bias in Large Language Models
  • Jan 1, 2024
  • Yuchen Cai + 5 more

  • Open Access Icon
  • Cite Count Icon 59
  • 10.1016/j.brat.2021.104013
Digital biomarkers of anxiety disorder symptom changes: Personalized deep learning models using smartphone sensors accurately predict anxiety symptoms from ecological momentary assessments
  • Dec 11, 2021
  • Behaviour Research and Therapy
  • Nicholas C Jacobson + 1 more

Similar Papers
  • Research Article
  • 10.1093/bjrai/ubaf010
Integrating NLP into Radiation Oncology: A Practical Guide to Transformer Architecture and Large Language Models
  • Aug 13, 2025
  • BJR|Artificial Intelligence
  • Reza K Mohammadi + 10 more

Natural Language Processing (NLP) is a key technique for developing Medical Artificial Intelligence (AI) systems that leverage Electronic Health Record (EHR) data to build diagnostic and prognostic models. NLP enables the conversion of unstructured clinical text into structured data that can be fed into AI algorithms. The emergence of transformer architecture and large language models (LLMs) has led to advances in NLP for various healthcare tasks, such as entity recognition, relation extraction, sentence similarity, text summarization, and question-answering. In this article, we review the major technical innovations that underpin modern NLP models and present state-of-the-art NLP applications that employ LLMs in radiation oncology research. However, it is crucial to recognize that LLMs are prone to hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation prior to clinical deployment. As such, we propose a comprehensive framework for assessing the NLP models based on their purpose and clinical fit, technical performance, bias and trust, legal and ethical implications, and quality assurance prior to implementation in clinical radiation oncology. Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology. Natural Language Processing (NLP) is a key technique for developing Medical Artificial Intelligence (AI) systems that leverage Electronic Health Record (EHR) data to build diagnostic and prognostic models. NLP enables the conversion of unstructured clinical text into structured data that can be fed into AI algorithms. The emergence of transformer architecture and large language models (LLMs) has led to advances in NLP for various healthcare tasks, such as entity recognition, relation extraction, sentence similarity, text summarization, and question-answering. In this article, we review the major technical innovations that underpin modern NLP models and present state-of-the-art NLP applications that employ LLMs in radiation oncology research. However, it is crucial to recognize that LLMs are prone to hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation prior to clinical deployment. As such, we propose a comprehensive framework for assessing the NLP models based on their purpose and clinical fit, technical performance, bias and trust, legal and ethical implications, and quality assurance prior to implementation in clinical radiation oncology. Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology.

  • Research Article
  • 10.1007/s44326-024-00043-w
The journey from natural language processing to large language models: key insights for radiologists
  • Dec 19, 2024
  • Journal of Medical Imaging and Interventional Radiology
  • Salvatore Claudio Fanni + 9 more

Artificial intelligence (AI) has undergone cycles of enthusiasm and stagnation, often referred to as “AI winters.” The introduction of large language models (LLMs), such as OpenAI’s ChatGPT in late 2022, has revitalized interest in AI, particularly within health-care applications, including radiology. The roots of AI in language processing can be traced back to Alan Turing’s 1950 work, which established foundational principles for natural language processing (NLP). Early iterations of NLP primarily concentrated on natural language understanding (NLU) and natural language generation (NLG), but they faced significant challenges related to contextual comprehension and the handling of lengthy text sequences. Recent advancements in NLP have demonstrated considerable promise in automating the analysis of unstructured data, including electronic health records and radiology reports. LLMs, which are based on the transformer architecture introduced in 2017, excel at capturing complex language dependencies and facilitating tasks, such as report generation and clinical decision support. This review critically examines the evolution from traditional NLP to LLMs, highlighting their transformative potential within the field of radiology. Despite the advantages presented by LLMs, challenges persist, including concerns regarding data privacy, the potential for generating misinformation, and the imperative for rigorous validation protocols. Addressing these challenges is crucial for harnessing the full potential of LLMs to enhance diagnostic precision and workflow efficiency in radiology, ultimately improving patient care and outcomes.

  • Research Article
  • 10.2196/72638
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report
  • Jun 11, 2025
  • Journal of Medical Internet Research
  • Ronghao Li + 8 more

BackgroundThe rapid advancements in natural language processing, particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks.ObjectiveThis paper explores LLMs with new prompt engineering technology to enhance model interpretability and improve the prediction performance of pulmonary disease based on a traditional deep learning model.MethodsA retrospective dataset including 2965 chest CT radiology reports was constructed. The reports were from 4 cohorts, namely, healthy individuals and patients with pulmonary tuberculosis, lung cancer, and pneumonia. Then, a novel prompt engineering strategy that integrates feature summarization (F-Sum), chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework was proposed. A feature summarization approach, leveraging term frequency–inverse document frequency (TF-IDF) and K-means clustering, was used to extract and distill key radiological findings related to 3 diseases. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs’ comprehension of disease-related text. In total, 3 state-of-the-art LLMs, GLM-4-Plus, GLM-4-air (Zhipu AI), and GPT-4o (OpenAI), were integrated with the prompt strategy to evaluate the efficiency in recognizing pneumonia, tuberculosis, and lung cancer. The traditional deep learning model, BERT (Bidirectional Encoder Representations from Transformers), was also compared to assess the superiority of LLMs. Finally, the proposed method was tested on an external validation dataset consisted of 343 chest computed tomography (CT) report from another hospital.ResultsCompared with BERT-based prediction model and various other prompt engineering techniques, our method with GLM-4-Plus achieved the best performance on test dataset, attaining an F1-score of 0.89 and accuracy of 0.89. On the external validation dataset, F1-score (0.86) and accuracy (0.92) of the proposed method with GPT-4o were the highest. Compared to the popular strategy with manually selected typical samples (few-shot) and CoT designed by doctors (F1-score=0.83 and accuracy=0.83), the proposed method that summarized disease characteristics (F-Sum) based on LLM and automatically generated CoT performed better (F1-score=0.89 and accuracy=0.90). Although the BERT-based model got similar results on the test dataset (F1-score=0.85 and accuracy=0.88), its predictive performance significantly decreased on the external validation set (F1-score=0.48 and accuracy=0.78).ConclusionsThese findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision-making.

  • Research Article
  • 10.1186/s12887-025-05945-6
Performance of several large language models when answering common patient questions about type 1 diabetes in children: accuracy, comprehensibility and practicality
  • Oct 10, 2025
  • BMC Pediatrics
  • Yasemin Denkboy Ongen + 3 more

BackgroundThe use of large language models (LLMs) in healthcare has expanded significantly with advances in natural language processing. Models, such as ChatGPT and Google Gemini, are increasingly used to generate human-like responses to questions, including those posed by patients and their families. With the rise in the incidence of type 1 diabetes (T1D) among children, families frequently seek reliable answers regarding the disease. Previous research has focused on type 2 diabetes, but studies on T1D in a pediatric population remain limited. This study aimed to evaluate and compare the performance and effectiveness of different LLMs when answering common questions about T1D.MethodsThis cross-sectional, comparative study used questions frequently asked by children with T1D and their parents. Twenty questions were selected from inquiries made to pediatric endocrinologists via social media. The performance of ChatGPT-3.5 ChatGPT-4 ChatGPT-4o was assessed using a standard prompt for each model. The responses were evaluated by five pediatric endocrinologists interested in diabetes using the General Quality Scale (GQS), a 5-point Likert scale, assessing factors such as accuracy, language simplicity, and empathy.ResultsAll five LLMs responded to the 20 selected questions, with their performance evaluated by GQS scores. ChatGPT-4o had the highest mean score (3.78 ± 1.09), while Gemini had the lowest (3.40 ± 1.24). Despite these differences, no significant variation was observed between the models (p = 0.103). However, ChatGPT-4o, ChatGPT-4, and Gemini Advanced produced the highest-quality answers compared to ChatGPT-3.5 and Gemini, scoring consistently between 3 and 4 points. ChatGPT-3.5 had the smallest variation in response quality, indicating consistency but not reaching the higher performance levels of other models.ConclusionsThis study demonstrated that all evaluated LLMs performed similarly in answering common questions about T1D. LLMs such as ChatGPT-4o and Gemini Advanced can provide above-average, accurate, and patient-friendly answers to common questions about T1D. Although no significant differences were observed, the latest versions of LLMs show promise for integration into healthcare, provided they continue to be evaluated and improved. Further research should focus on developing specialized LLMs tailored for pediatric diabetes care.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12887-025-05945-6.

  • Research Article
  • 10.55041/ijsrem35419
The Nexus of AI and Vector Databases: Revolutionizing NLP with LLMs
  • Jun 14, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Nazeer Shaik

Vector databases play a critical role in the efficiency and functionality of large language models (LLMs), providing scalable and efficient storage and retrieval of high-dimensional vectors. This paper explores the significance of vector databases in the context of LLMs, highlighting their role in information retrieval, similarity search, training, and adaptation processes. Despite the challenges posed by high-dimensional data, vector databases offer invaluable benefits in enhancing the capabilities of LLMs and driving advancements in natural language processing (NLP). Future research and development in this area promise to further optimize the integration and performance of vector databases, fueling continued innovation in LLM applications. Keywords: Vector databases, Large language models (LLMs), Natural language processing (NLP), Information retrieval, Similarity search, Training, Adaptation, Scalability, Efficiency.

  • Preprint Article
  • 10.5194/egusphere-egu24-18493
Leveraging recent advances in Large Language Models for the ocean science community
  • Mar 11, 2024
  • Redouane Lguensat

Large Language Models (LLMs) have made significant strides in language understanding, including natural language processing, summarization, and translation, and they have the potential to be applied to a range of climate-related challenges. For instance, LLMs can be leveraged for data cleaning and transformation, and also assisting scientists/engineers in their daily work tasks. For the machine learning community, the year 2023 was arguably the year of breakthroughts in LLM use in production. I present in this work the exciting potential for recent advances in LLMs to revolutionize how the ocean science community can interact with computer code, information gathering, dataset finding, etc. Specifically, I will present simple applications of how these advancements in Natural Language Processing (NLP) can assist the NEMO ocean model community. Examples range from using question answering systems for browsing efficiently NEMO documentation to creating conversational agents or chatbots that can assist not only new members wanting to learn about the NEMO model but also confirmed users.  An important aspect of this work is relying only on open source LLMs, evaluating the performances of several models and discussing the ethical implications of these tools. I also discuss the question of whether using these LLMs blindly without domain knowledge is a good idea, as an important chunk of this work can arguably be easily done by anyone with good computer science skills thanks to the democratization of data science tools and learning materials.  

  • Research Article
  • 10.1093/bioinformatics/btaf196
Automated assignment grading with large language models: insights from a bioinformatics course.
  • Jul 1, 2025
  • Bioinformatics (Oxford, England)
  • Pavlin G Poličar + 3 more

Providing students with individualized feedback through assignments is a cornerstone of education that supports their learning and development. Studies have shown that timely, high-quality feedback plays a critical role in improving learning outcomes. However, providing personalized feedback on a large scale in classes with large numbers of students is often impractical due to the significant time and effort required. Recent advances in natural language processing and large language models (LLMs) offer a promising solution by enabling the efficient delivery of personalized feedback. These technologies can reduce the workload of course staff while improving student satisfaction and learning outcomes. Their successful implementation, however, requires thorough evaluation and validation in real classrooms. We present the results of a practical evaluation of LLM-based graders for written assignments in the 2024/25 iteration of the Introduction to Bioinformatics course at the University of Ljubljana. Over the course of the semester, more than 100 students answered 36 text-based questions, most of which were automatically graded using LLMs. In a blind study, students received feedback from both LLMs and human teaching assistants (TAs) without knowing the source, and later rated the quality of the feedback. We conducted a systematic evaluation of six commercial and open-source LLMs and compared their grading performance with human TAs. Our results show that with well-designed prompts, LLMs can achieve grading accuracy and feedback quality comparable to human graders. Our results also suggest that open-source LLMs perform as well as commercial LLMs, allowing schools to implement their own grading systems while maintaining privacy.

  • Research Article
  • Cite Count Icon 3
  • 10.1145/3727200.3727217
Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems
  • Dec 1, 2024
  • ACM SIGEnergy Energy Informatics Review
  • Grant Wilkins + 2 more

The rapid adoption of large language models (LLMs) has led to significant advances in natural language processing and text generation. However, the energy consumed through LLM model inference remains a major challenge for sustainable AI deployment. To address this problem, we model the workload-dependent energy consumption and runtime of LLM inference tasks on heterogeneous GPU-CPU systems. By conducting an extensive characterization study of several state-of-the-art LLMs and analyzing their energy and runtime behavior across different magnitudes of input prompts and output text, we develop accurate ( R 2 > 0.96) energy and runtime models for each LLM. We employ these models to explore an offline, energy-optimal LLM workload scheduling framework. Through a case study, we demonstrate the advantages of energy and accuracy aware scheduling compared to existing best practices.

  • Book Chapter
  • Cite Count Icon 1
  • 10.3233/faia241060
ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model
  • Oct 16, 2024
  • Zhiwei Liu + 4 more

The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. ConspEmoLLM can be easily applied to identify and classify conspiracy-related text in the real world. The work has been released at https://github.com/lzw108/ConspEmoLLM/.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/shti240521
Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions.
  • Aug 22, 2024
  • Studies in health technology and informatics
  • Pushpa Kumarapeli + 2 more

Computer-mediated clinical consultation, involving clinicians, electronic health record (EHR) systems, and patients, yield rich narrative data. Despite advancements in Natural Language Processing (NLP), these narratives remain underutilised. Free text recording in EHRs allows expressivity, complements structured data from clinical coding systems, and facilitates collaborative care. Large language models (LLMs) excel in understanding and generating natural language, enabling complex dialogue processing. Integrating LLM tools into consultations could harness the untapped potential of free text to identify patient safety concerns, support diagnosis and provide content to enhance clinical-patient interactions. Tailoring LLMs for specific consultation tasks through pre-training and fine-tuning is viable. This paper outlines approaches for adopting LLMs in primary care and suggests that using fine-tuned LLMs with prompt engineering could enhance computer-mediated clinical consultation cost-effectively.

  • Research Article
  • 10.21577/0103-5053.20250067
Recent Advances in Natural Language Processing in Chemistry and Materials Science
  • Jan 1, 2025
  • Journal of the Brazilian Chemical Society
  • Ronaldo Cristiano Prati

Natural Language Processing (NLP) in chemistry and materials science enables computers to understand, analyze, and generate human-readable output related to chemical concepts and materials. With the latest advancements in NLP, text processing at a near-human level has become possible in various tasks. Large Language Models (LLMs) have demonstrated exceptional proficiency in text generation, leading to the redefinition of numerous specific NLP tasks as text generation problems. This review explores the recent progress in applying LLMs to specialized domains, such as chemistry and materials science. It was discussed how LLMs overcome limitations of traditional NLP methods (such as rigid rule-based systems and shallow statistical models) by enabling context-aware interpretation of unstructured literature, flexible entity recognition (e.g., compounds, reactions), and generative tasks. Using the capabilities of LLMs, researchers in these fields can benefit from enhanced text processing, more accurate information extraction, and improved understanding of complex chemical concepts, making it a pivotal tool for accelerating discovery in chemically complex spaces, paving the way for novel tasks such as reaction prediction and molecular design.

  • Research Article
  • 10.1200/jco.2025.43.16_suppl.1558
Use of a large language model (LLM) for pan-cancer automated detection of anti-cancer therapy toxicities and translational toxicity research.
  • Jun 1, 2025
  • Journal of Clinical Oncology
  • Ziad Bakouny + 18 more

1558 Background: Understanding why patients develop adverse events to anti-cancer therapies and predicting the occurrence of these toxicities has lagged behind tumor response biomarker development. This critical gap is primarily due to limited availability of large-scale curated toxicity data. Here, we leverage advances in natural language processing ( Jee J et al., Nature, 2024 ), pooled clinical trial data, and associated germline sequencing to detect adverse event data and determine clinical and genomic correlates. Methods: We utilized the Llama 3.1 LLM to automatically annotate patient adverse event data for 5 of the most common anti-cancer therapy related adverse events (adrenal insufficiency, hyperthyroidism, hypothyroidism, colitis, and pneumonitis). To validate LLM predictions at the patient-level, we used a pooled institutional dataset with gold standard prospectively collected adverse event data from 1,754 patients with solid tumors across 675 individual clinical trials. We further validated the LLM predictions at the clinical note-level using a subset of 100 manually curated notes. We evaluated note-level and patient-level predictions using sensitivity and specificity. Patient-level time-to-adverse event development predictions were evaluated using Pearson R 2 coefficients. Common Terminology Criteria for Adverse Events v 5.0 was used for toxicity definitions. Results: The patients’ average age (standard deviation) was 61.6 (14.5) years and 836 (47.7%) were female. The most common cancers were non-small cell lung cancer (N= 194, 11.1%), soft tissue sarcoma (N=171, 9.7%), breast cancer (N= 155, 8.8%), and melanoma (N=129, 7.4%). 44 (2.5%) patients had adrenal insufficiency, 88 colitis (5.0%), 253 hypothyroidism (14.4%), 66 hyperthyroidism (4.4%), and 146 pneumonitis (8.3%). Among 1258 patients with complete systemic therapy information available, 422 (33.5%) were treated with immunotherapy and 563 (44.8%) with chemotherapy. The performance metrics for LLM predictions at the note and patient levels are summarized in the table. Conclusions: We demonstrate the ability of an LLM to accurately annotate anti-cancer therapy toxicity data across a large number of patients. This approach is scalable to other toxicities and promises to spur adverse event research. Clinical and genomic correlates of anti-cancer therapy adverse events, using data from all patients with solid tumors with MSK-IMPACT data, will also be presented at the meeting. Performance metrics for LLM model. Toxicity Note-level (N= 100 notes) Patient-level (N= 1,754 patients) Sensitivity Specificity Sensitivity Specificity R 2 Adrenal insufficiency 100.0% 97.8% 97.7% 94.7% 98.2% Colitis 66.7% 99.0% 94.3% 80.4% 89.2% Hyperthyroidism 57.1% 100.0% 74.0% 91.4% 98.7% Hypothyroidism 100.0% 88.9% 88.1% 74.0% 96.1% Pneumonitis 76.9% 97.7% 98.6% 70.1% 83.9%

  • Preprint Article
  • 10.2196/preprints.72638
Enhancing Pulmonary Disease Prediction Using Large Language Models with Feature Summarization and Hybrid Retrieval-Augmented Generation (Preprint)
  • Feb 13, 2025
  • Ronghao Li + 8 more

BACKGROUND The rapid advancements in natural language processing (NLP), particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks. OBJECTIVE To address these limitations, this study proposes a novel prompt engineering strategy that integrates feature summarization, chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework. METHODS A feature summarization approach, leveraging TF-IDF and K-means clustering, was employed to extract and distill key radiological findings. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs’ comprehension of disease-related text. The proposed strategy was evaluated using a multicenter dataset containing radiology reports on pneumonia, tuberculosis, and lung cancer, with three state-of-the-art LLMs: GLM-4-plus, GLM-4-air, and GPT-4o. RESULTS Comparative analyses were performed against a BERT-based prediction model and various other prompt engineering techniques. Our strategy achieved superior performance, attaining an accuracy of 0.8947 and an F1 score of 0.8887 on the primary dataset, alongside an accuracy of 0.9167 and an F1 score of 0.8631 on an external validation dataset of radiology reports. CONCLUSIONS These findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision making.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 26
  • 10.3390/rs15133232
The Potential of Visual ChatGPT for Remote Sensing
  • Jun 22, 2023
  • Remote Sensing
  • Lucas Prado Osco + 4 more

Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. These are known as Visual LLMs and one notable model is Visual ChatGPT, which combines ChatGPT’s LLM capabilities with visual computation to enable effective image analysis. These models’ abilities to process images based on textual inputs can revolutionize diverse fields, and while their application in the remote sensing domain remains unexplored, it is important to acknowledge that novel implementations are to be expected. Thus, this is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model’s limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.

  • Research Article
  • 10.1016/j.neunet.2025.107856
A survey of low-bit large language models: Basics, systems, and algorithms.
  • Dec 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Ruihao Gong + 12 more

A survey of low-bit large language models: Basics, systems, and algorithms.

More from: Advances in Methods and Practices in Psychological Science
  • Research Article
  • 10.1177/25152459251379432
Do Musicians Have Better Short-Term Memory Than Nonmusicians? A Multilab Study
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Massimo Grassi + 99 more

  • Research Article
  • 10.1177/25152459251380452
A Tutorial on Distribution-Free Uncertainty Quantification Using Conformal Prediction
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Tim Kaiser + 1 more

  • Research Article
  • 10.1177/25152459251375445
Consistent and Precise Description of Research Outputs Could Improve Implementation of Open Science
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Evan Mayo-Wilson + 3 more

  • Research Article
  • 10.1177/25152459251351287
Citing Decisions in Psychology: A Roadblock to Cumulative and Inclusive Science
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Katherine M Lawson + 4 more

  • Research Article
  • 10.1177/25152459251360642
A Fragmented Field: Construct and Measure Proliferation in Psychology
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Farid Anvari + 6 more

  • Research Article
  • 10.1177/25152459251343043
Does Truth Pay? Investigating the Effectiveness of the Bayesian Truth Serum With an Interim Payment: A Registered Report
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Claire M Neville + 1 more

  • Research Article
  • 10.1177/25152459251361013
The DECIDE Framework: Describing Ethical Choices in Digital-Behavioral-Data Explorations
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Heather Shaw + 5 more

  • Research Article
  • 10.1177/25152459251343582
Large Language Models for Psychological Assessment: A Comprehensive Overview
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Jocelyn Brickman + 2 more

  • Research Article
  • 10.1177/25152459251355585
On Partial Versus Full Mediation and the Importance of Effect Sizes
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Thomas Ledermann + 2 more

  • Research Article
  • 10.1177/25152459251348431
Bestiary of Questionable Research Practices in Psychology
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Tamás Nagy + 18 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon