Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Large Language Models in the Business World: Usage Areas, Benefits, Impacts, and Future Perspectives

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Large Language Models (LLMs) have become a central component of digital transformation in business organizations. Rather than considering LLMs holistically, existing research often tends to focus on their applications, organizational benefits, employment effects, or implementation challenges as separate dimensions. To address this gap, this study provides an integrated qualitative examination of the use of LLMs in the business world, focusing on their application areas, organizational benefits, impacts on employment and job transformation, limitations, and future potential. The study adopts a qualitative document analysis design. A systematic review of publications from 2018 to 2025 was conducted using internationally recognized academic databases and reputable industry sources. Qualitative content analysis was undertaken on the collected documents using a deductive-inductive thematic framework aligned with the study’s research questions. LLMs are extensively applied across multiple business functions, including human resources, customer service, data analysis and reporting, content creation and marketing, financial analysis, market intelligence, legal document processing, corporate training, and automation-integrated systems. LLMs adoption enhances organizational efficiency, supports cost optimization, and improves managerial decision-making by enabling rapid, data-driven, and scenario-based insights. In terms of employment, LLMs primarily drive task-level automation and job role transformation rather than direct job displacement, increasing the importance of higher-order skills such as critical thinking, supervision, and digital literacy. Despite these benefits, the study identifies persistent challenges related to ethical risks, data security and privacy, model bias, hallucinations, and governance mechanisms. This study contributes to the literature by offering an integrated qualitative framework that simultaneously links business value creation, workforce transformation, and implementation challenges, positioning LLMs adoption as a holistic phenomenon encompassing socio-technical and organizational phenomenon rather than a purely technological advancement.

Similar Papers
  • Front Matter
  • Cite Count Icon 1
  • 10.3389/frai.2024.1516832
Editorial: Large language models in work and business.
  • Nov 29, 2024
  • Frontiers in artificial intelligence
  • Şadi Evren Şeker

In today’s rapidly evolving business landscape, Artificial Intelligence (AI), and specifically Large Language Models (LLMs), are redefining how organizations operate, make decisions, and engage with customers. AI-driven technologies have become indispensable, providing businesses with powerful tools to streamline operations, derive actionable insights from vast data, and foster more meaningful customer interactions. For business leaders, scholars, and practitioners alike, understanding the transformative potential of AI isn’t just advantageous—it’s essential to staying competitive in an increasingly data-driven world.This editorial delves into recent scholarly advancements in LLM applications within business contexts, analyzing studies that explore AI’s potential across various domains, from decision support to creative industries. By introducing a structured framework, this editorial highlights key insights and contributions from recent studies, assessing their value to academia and industry. The following comparative analysis sheds light on how these innovations shape our understanding of AI’s role in business while pointing to future research directions.Puyt and Madsen's (2024) study stands out as a foundational exploration of LLM accuracy, assessing ChatGPT-4's ability to recount the history of the SWOT analysis-a vital business strategy tool. Their findings reveal that, while ChatGPT-4 effectively conveys general concepts, it struggles with detailed historical information, often producing inaccuracies or "hallucinations." This gap underscores the need for LLMs to be trained with verified academic data, particularly for strategic business applications that demand precision. This study not only contributes to the literature by proposing methods to evaluate AI accuracy in historical contexts but also highlights the importance of rigorous information vetting in industry settings where reliability is crucial.In contrast, Raikov et al. (2024) explore a hybrid intelligence model that combines LLM capabilities with explainable AI (XAI) principles to enhance human-machine collaboration. Their approach emphasizes cognitive semantics, improving transparency and decision-making efficiency. The hybrid model's real-time adaptability addresses the needs of complex, regulated industries such as finance and healthcare, where trust in AI decisions is paramount. Academically, this study provides a valuable addition to XAI literature by demonstrating how LLMs can bridge the gap between AI autonomy and human oversight, making it a model for future human-AI interactions in complex business environments.Another significant study by Mariotti and colleagues (2024) examines the integration of LLMs with enterprise knowledge graphs to enhance data-driven decision-making. By enabling organizations to leverage knowledge graphs for more accurate and scalable data retrieval, this research provides a robust framework for businesses seeking efficient knowledge management systems. The academic contribution here lies in advancing the dialogue between LLMs and knowledge graphs, emphasizing ethical data handling and quality standards essential for industry applications. For enterprises, the study offers practical solutions to achieve streamlined data management, balancing automation with privacy and security. 2024) take a different approach, investigating LLMs' role in creative industries, specifically within fashion design. They introduce a hybrid intelligence model that supports creative processes, allowing AI to complement rather than replace human ingenuity. While LLMs in this field demonstrate potential in automating repetitive design tasks and enhancing customer personalization, the study reveals limitations in AI's ability to handle spatial and stylistic nuances. This study's academic contribution lies in promoting human-AI co-creation, inspiring further research into AI applications across diverse creative sectors, including media and marketing.Collectively, these studies not only illuminate LLMs' transformative potential in business but also highlight critical ethical and operational considerations. Ensuring accuracy, transparency, and data privacy are vital to responsibly integrating AI into business workflows. Future research should focus on enhancing LLM accuracy, refining hybrid intelligence models, and exploring creative AI applications, all while maintaining ethical standards. As LLMs evolve, interdisciplinary collaborations will be essential to harness their full potential, making AI an ethical, effective, and innovative force in the business world.

  • Research Article
  • 10.28945/5693
Unlocking the Potential of Large Language Models in Education: Factors Influencing Adoption by Instructional Designers and Academics
  • Jan 1, 2026
  • Journal of Information Technology Education: Research
  • Katherine L Fourie + 2 more

Aim/Purpose: The study investigates the factors influencing the acceptance and utilisation of large language models (LLMs) (predictor variables of LLM usage), such as ChatGPT, in Learning design by instructional designers and university-teaching academics from various countries. Background: Large language models (LLMs) have exploded onto the scene, transforming the landscape of learning design. Instructional designers and university teaching academics have been overburdened with content creation for their teaching programmes, and the arrival of LLM models will help in this regard by developing more interactive content that drives student engagement and, in turn, contributes to student success. Since LLMs are a relatively new phenomenon, little is known about the factors influencing their acceptance in learning design; therefore, this research is needed, as learning design principles are the bedrock of student engagement and success. Methodology: A cross-sectional correlational quantitative study was employed. Data was collected using an online questionnaire posted on social media, including LinkedIn, from 203 instructional designers and university teaching academics. Purposive and snowball sampling methods were used to target instructional designers and university teaching academics at colleges and universities worldwide. Participants were asked to share the survey link with fellow instructional designers and university-teaching academics in their communities. The factor structure of the data was determined using exploratory factor analysis. Nonetheless, the factor structure derived from the LLMs did not entirely reflect the original configuration of the Unified Theory of Acceptance and Use of Technology (UTAUT3), as certain predictors appeared to coalesce, indicating LLMs’ unique nature in learning design. Confirmatory factor analysis was used to verify the fit of the data on the measurement model. First-order and second-order structural modelling were used to identify the structural relationships among the variables. Contribution: The study determines significant factors for the acceptance of LLMs by instructional designers and academic teaching staff in learning design, enabling possible opportunities for best practices in the field through interventions to optimize LLM usage. The study applies the technology acceptance model to the emerging LLM technology and extends the technology acceptance model by adding the trust construct as a predictor variable. Findings: The structural analysis results indicated that the ingrained LLM practices, LLM peer-driven expectations, innovative propensity towards LLM adoption, reliability and provider trust in LLMs, and ease of use and support influenced perceived LLM benefits and usage, but community standards and infrastructure had no influence. The second-order structural equation modelling indicated that perceived LLM benefits and usage and ingrained LLM habits contributed most to the learning design. Recommendations for Practitioners: Teaching academics and instructional designers must use LLMs in designing content, assessments, and interactive learning activities, and attend LLM training workshops on prompting and best practices in integrating LLMs into learning and teaching to see their benefits; hence, regular use of LLMs will then lead to trust and innovation in LLMs usage, enhancing learning design and improving student learning outcomes. Recommendation for Researchers: Researchers must use mixed methods approaches to have a deeper understanding of the factors influencing LLMs. Since habit and perceived LLM benefits and usage contributed the most variance to learning design, researchers must investigate strategies that optimise these factors in learning design, such as effective intervention strategies that can help form positive LLM habits. In addition, the findings provide researchers with a starting point for future research. Further researchers must investigate interventions that optimise the influence of personal innovativeness and trust that contributed the least variance to learning design, hence unlocking the potential of LLMs in learning design through innovation, responsible, and ethical use. Impact on Society: The use of LLMs in learning design has a high possibility of transforming education, specifically the learning design landscape. Using LLMs will free up more time for teaching academics and instructional designers so that they spend more time on higher-order thinking skill demands. Consequently, the students will be exposed to more engaging and interactive content, resulting in improved learning outcomes. Future Research: Future research must include context-derived external variables in technology acceptance models, such as levels of prompting competencies, to provide a deeper understanding of LLMs. In addition, future research must be based on the application and impact of LLMs on student engagement and success, and their attainment of 21st-century skills.

  • Conference Article
  • 10.5593/sgem2025/2.1/s07.04
FAMILIARITY OF THE UNIVERSITY STUDENTS OF LANDSCAPE ENGINEERING WITH LARGE LANGUAGE MODELS
  • Aug 15, 2025
  • International Multidisciplinary Scientific GeoConference SGEM ...
  • Elena Aydin + 1 more

The recent advances in machine learning led to exponential improvement in the field of artificial intelligence and large language models (LLMs). For university study and university students, LLMs can offer opportunities in several areas. But on the other hand, output from LLMs should be handled using critical thinking. In our research, we wanted to investigate the level of awareness of publicly available LLMs among the bachelor and master students (n=22) of Landscape Engineering program at the Slovak University of Agriculture in Nitra and the purpose of LLMs use in Slovak language. All the respondents heard about ChatGPT as an example of LLMs. The awareness of other LLMs was lower as expected, in the decreasing order of Gemini, Copilot and Perplexity. Majority of students found the AI tools at least �somewhat useful� with their daily tasks. More than 10% of the students from both groups concluded that AI tools provide substantial support for their education. According to our results, students use generally LLMs mostly for writing parts of assignments and projects (74.3%) and AI powered web search services (63.6%). In both cases more than 50% of respondents found LLMs useful for these tasks. In general, bachelor group showed to be more experienced in using LLMs for different purposes. In contrast to master group, bachelor group found LLMs very useful for explaining topics that they could not understand using LLMs as a tutor (nearly 70%) as well as for brainstorming and exploring new ideas (46%). Especially these last two usage areas can provide considerable support to traditional classroom education.

  • Research Article
  • 10.65521/ijacect.v14i3s.1636
An Iterative Comprehensive Evaluation of Large Language and Vision Models in Medical AI: Benchmarks, Adaptability, and Deployment Challenges
  • Dec 22, 2025
  • International Journal on Advanced Computer Engineering and Communication Technology
  • Wani H Bisen + 1 more

PRISMA principles provide a thorough analysis of current advances in large language models (LLMs) and multimodal transformers for medical applications. As LLMs like GPT-4, BioGPT, Med-PaLM, and hybrid frameworks like COMCARE enter clinical processes, thorough synthesis is essential to increase performance, methodological adaptability, and implementation practicality in many healthcare situations. Their creativity in medical report writing, decision support, and diagnosis is notable, but the literature has not established a cohesive taxonomy that evaluates these models by uniform metrics, domain-specific generalizability, and ethical acceptability. Over 40 studies examined radiology report production, clinical question responding, cognitive assessment, and causal reasoning. After testing vision-language transformer architectures like PEGASUS and ETB MII for automated imaging-based reporting, graph-based reasoning was used to evaluate drug safety and interpretability of knowledge- integrated models like KELLM. As needed, BLEU, ROUGE, F1 score, CIDEr, and qualitative evaluations were used. Domain- adapted and hybrid models improve diagnostic accuracy, task- specific explainability, and clinician workload differently. Model illusion, biases, hostile manipulation, and resource-intensive fine- tuning persist. The report recommends strong benchmarking, public evaluation standards, and ethical frameworks for LLMs in high-stakes medical applications. This study defines LLMs' therapeutic utility and recommends infrastructure, ethics, and technology for safe and successful integration. This effort prepares scalable, interpretable, and equitable medical AI systems.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.identj.2025.109344
Evaluating Retrieval-Augmented Generation-Large Language Models for Infective Endocarditis Prophylaxis: Clinical Accuracy and Efficiency.
  • Feb 1, 2026
  • International dental journal
  • Paak Rewthamrongsris + 5 more

The use of large language models (LLMs) in healthcare is expanding. Retrieval-augmented generation (RAG) addresses key LLM limitations by grounding responses in domain-specific, up-to-date information. This study evaluated RAG-augmented LLMs for infective endocarditis (IE) prophylaxis in dental procedures, comparing their performance with non-RAG models assessed in our previous publication using the same question set. A pilot study also explored the utility of an LLM as a clinical decision support tool. An established IE prophylaxis question set from previous research was used to ensure comparability. Ten LLMs integrated with RAG were tested using MiniLM L6 v2 embeddings and FAISS to retrieve relevant content from the 2021 American Heart Association IE guideline. Models were evaluated across five independent runs, with and without a preprompt ('You are an experienced dentist'), a prompt-engineering technique used in previous research to improve LLMs accuracy. Three RAG-LLMs were compared to their native (non-RAG) counterparts benchmarked in the previous study. In the pilot study, 10 dental students (5 undergraduate, 5 postgraduate in oral and maxillofacial surgery) completed the questionnaire unaided, then again with assistance from the best performing LLM. Accuracy and task time were measured. DeepSeek Reasoner achieved the highest mean accuracy (83.6%) without preprompting, while Grok 3 beta reached 90.0% with preprompting. The lowest accuracy was observed for Claude 3.7 Sonnet, at 42.1% without preprompts and 47.1% with preprompts. Preprompting improved performance across all LLMs. RAG's impact on accuracy varied by model. Claude 3.7 Sonnet showed the highest response consistency without preprompting; with preprompting, Claude 3.5 Sonnet and DeepSeek Reasoner matched its performance. DeepSeek Reasoner also had the slowest response time. In the pilot study, LLM support slightly improved postgraduate accuracy, slightly reduced undergraduate accuracy, and significantly increased task time for both. While RAG and prompting enhance LLM performance, real-world utility in education remains limited. LLMs with RAG provide rapid and accessible support for clinical decision-making. Nonetheless, their outputs are not always accurate and may not fully reflect evolving medical and dental knowledge. It is crucial that clinicians and students approach these tools with digital literacy and caution, ensuring that professional judgment remains central.

  • Research Article
  • Cite Count Icon 9
  • 10.1609/aaai.v37i13.26879
Exploring Social Biases of Large Language Models in a College Artificial Intelligence Course
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Skylar Kolisko + 1 more

Large neural network-based language models play an increasingly important role in contemporary AI. Although these models demonstrate sophisticated text generation capabilities, they have also been shown to reproduce harmful social biases contained in their training data. This paper presents a project that guides students through an exploration of social biases in large language models. As a final project for an intermediate college course in Artificial Intelligence, students developed a bias probe task for a previously-unstudied aspect of sociolinguistic or sociocultural bias they were interested in exploring. Through the process of constructing a dataset and evaluation metric to measure bias, students mastered key technical concepts, including how to run contemporary neural networks for natural language processing tasks; construct datasets and evaluation metrics; and analyze experimental results. Students reported their findings in an in-class presentation and a final report, recounting patterns of predictions that surprised, unsettled, and sparked interest in advocating for technology that reflects a more diverse set of backgrounds and experiences. Through this project, students engage with and even contribute to a growing body of scholarly work on social biases in large language models.

  • Research Article
  • 10.1609/aaai.v39i22.34554
Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Yanrui Du + 4 more

Large Language Models (LLMs) have gained significant attention for their exceptional performance across various domains. Despite their advancements, concerns persist regarding their implicit bias, which often leads to negative social impacts. Therefore, it is essential to identify the implicit bias in LLMs and investigate the potential threat posed by it. Our study focused on a specific type of implicit bias, termed the ''Yes-No'' implicit bias, which refers to LLMs' inherent tendency to favor ''Yes'' or ''No'' responses to a single instruction. By comparing the probability of LLMs generating a series of ''Yes'' versus ''No'' responses, we observed different inherent response tendencies exhibited by LLMs when faced with different instructions. To further investigate the impact of such bias, we developed an attack method called Implicit Bias In-Context Manipulation, attempting to manipulate LLMs' behavior. Specifically, we explored whether the ''Yes'' implicit bias could manipulate ''No'' responses into ''Yes'' in LLMs' responses to malicious instructions, leading to harmful outputs. Our findings revealed that the ''Yes'' implicit bias brings a significant security threat, comparable to that of carefully designed attack methods. Moreover, we offered a comprehensive analysis from multiple perspectives to deepen the understanding of this security threat, emphasizing the need for ongoing improvement in LLMs' security.

  • Conference Article
  • Cite Count Icon 312
  • 10.1145/3582269.3615599
Gender bias and stereotypes in Large Language Models
  • Nov 5, 2023
  • Hadas Kotek + 2 more

Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs’ behavior with respect to gender stereotypes, a known issue for prior models. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset, which is likely to be included in the training data of current LLMs. We test four recently published LLMs and demonstrate that they express biased assumptions about men and women’s occupations. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person’s gender; (b) these choices align with people’s perceptions better than with the ground truth as reflected in official job statistics; (c) LLMs in fact amplify the bias beyond what is reflected in perceptions or the ground truth; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize the ambiguity; (e) LLMs provide explanations for their choices that are factually inaccurate and likely obscure the true reason behind their predictions. That is, they provide rationalizations of their biased behavior. This highlights a key property of these models: LLMs are trained on imbalanced datasets; as such, even with the recent successes of reinforcement learning with human feedback, they tend to reflect those imbalances back at us. As with other types of societal biases, we suggest that LLMs must be carefully tested to ensure that they treat minoritized individuals and communities equitably.

  • Research Article
  • Cite Count Icon 22
  • 10.3390/app15020671
Large Language Models as Evaluators in Education: Verification of Feedback Consistency and Accuracy
  • Jan 11, 2025
  • Applied Sciences
  • Hyein Seo + 6 more

The recent advancements in large language models (LLMs) have brought significant changes to the field of education, particularly in the generation and evaluation of feedback. LLMs are transforming education by streamlining tasks like content creation, feedback generation, and assessment, reducing teachers’ workload and improving online education efficiency. This study aimed to verify the consistency and reliability of LLMs as evaluators by conducting automated evaluations using various LLMs based on five educational evaluation criteria. The analysis revealed that while LLMs were capable of performing consistent evaluations under certain conditions, a lack of consistency was observed both among evaluators and across models for other criteria. Notably, low agreement among human evaluators correlated with reduced reliability in LLM evaluations. Furthermore, variations in evaluation results were influenced by factors such as prompt strategies and model architecture, highlighting the complexity of achieving reliable assessments using LLMs. These findings suggest that while LLMs have the potential to transform educational systems, careful selection and combination of models are essential to improve their consistency and align their performance with human evaluators in educational settings.

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s43681-024-00613-4
Parity benchmark for measuring bias in LLMs
  • Dec 17, 2024
  • AI and Ethics
  • Shmona Simpson + 3 more

Bias in Large Language Models (LLMs) can perpetuate harmful stereotypes, reinforce inequities, and lead to unfair outcomes in applications from automated content moderation to decision-making systems. These biases also limit the applicability of LLMs in areas such as law, medicine, education, and finance. This paper introduces a benchmark designed to measure and evaluate biases in LLMs. It addresses the protected characteristics on which bias is often enacted, including gender, race, socioeconomic status, and intersectional identities. By systematically assessing LLMs using an expert-curated dataset, the benchmark tests for the biases present in recent large language models like GPT-4o, Llama 3, Gemini and Claude 3.5 Sonnet. This paper details the construction of the benchmark, including the selection of the categories (Ageism, Colonial bias, Colorism, Disability, Homophobia, Racism, Sexism, and Supremacism), the evaluation metrics, and the implementation of testing protocols. Through empirical analysis, we evaluated the LLMs and observed significant performance disparities in multiple categories. All LLMs had an accuracy of at least 74% on average when tested for knowledge regarding these categories. However, this threshold was reduced when LLMs were required to interpret, reason, or deduce. This was especially true regarding homophobia, colonial praxis, and disability. GPT-4 performed best regarding content knowledge followed closely by Claude 3.5 Sonnet, while Gemma-1.1 performed best with interpretation. Gemini 1.5 Pro was better overall than its predecessor, Gemini 1.0, demonstrating that rapid improvement in bias mitigation is possible. These findings highlight the critical need for ongoing monitoring and mitigation strategies to address bias in generative AI systems. We hope that it will serve as a critical tool for policymakers aiming to promote fairness and provide an opportunity for LLM developers to leverage the benchmark to unlock use cases where they may better serve all of us.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 293
  • 10.1038/s41368-023-00239-y
ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model
  • Jul 28, 2023
  • International Journal of Oral Science
  • Hanyao Huang + 10 more

The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.

  • Research Article
  • 10.1371/journal.pone.0348819
Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity.
  • Jan 1, 2026
  • PloS one
  • Qiufeng Jia + 7 more

Large language models are increasingly integrated into healthcare for clinical decision support and patient communication. Although these models can pass explicit social bias tests, they may retain implicit biases-latent associations between social groups and attributes-that could influence medical judgment. To systematically evaluate the presence, magnitude, and behavioral impact of implicit biases in large language models within the medical domain across six high-stakes categories: gender, race, socioeconomic status, health conditions, religion, and healthcare systems. A descriptive cross-sectional study using a multi-faceted evaluation framework. Computational analysis of 10 mainstream global large language models, including proprietary models (ChatGPT-4o, Gemini-2.0-Flash) and open-source models (DeepSeek-V3, Qwen3). We constructed 24 medical bias datasets across six categories. Bias was assessed using three methods: (1) the Large Language Model Word Association Test, a prompt-based method for revealing implicit biases; (2) the Large Language Model Relative Decision Test, a strategy for detecting subtle discrimination in situational decision-making; (3) Paired-Prompt Analysis, used to examine whether implicit associations predict discriminatory decisions. All 10 models exhibited systematic implicit biases (Mean IAT Bias > 0) across all categories, with the strongest biases observed in Race (Mean = 0.61) and Socioeconomic Status (Mean = 0.56). Advanced reasoning capabilities (Chain-of-Thought) did not significantly reduce bias magnitude. Crucially, stronger implicit associations significantly predicted discriminatory choices in downstream medical decision tasks (p < 0.001). Current safety alignment techniques fail to eliminate implicit biases in large language models within the medical domain. These latent associations translate into biased decision-making, posing risks for health equity. Future development must prioritize representational debiasing over superficial alignment. Furthermore, healthcare professionals must embrace a stance of "AI vigilance": they should critically evaluate algorithmic outputs as fallible "second opinions" rather than objective truths, thereby ensuring that human judgment remains the ultimate safeguard for equitable patient care.

  • Research Article
  • 10.70088/50wkze06
How to Use LLMs Ethically in Academic Writing?
  • May 18, 2025
  • Education Insights
  • Tiantian Yu

This paper presents an experimental study based on selected Large Language Models (LLMs) and Artificial Intelligence Generated Content (AIGC) detection systems, conducted within a mixed-methods research paradigm that combines empirical validation and Qualitative Content Analysis (QCA). The empirical validation process consists of both a condition optimization experiment and the main experiment, while the materials for qualitative content analysis are directly derived from these experimental outputs. In the experiments, six LLMs are evaluated using four different AIGC detectors. Through the analysis of the contents generated by these LLMs, the existing theoretical framework, which is referred to as the authors’ checklist, for the application of LLMs in academic writing is revised. The updated framework refines the checklist step for assessing and amending the accuracy of AI-generated content. The updated framework contains five steps, Intellectual Contribution, Accuracy of Conceptions, Accuracy of Demonstrations, Academic Competency, and Transparency, for authors’ academic writing with the assistance of LLMs. Additionally, it emphasizes the importance of authors’ innovation and proficiency in prompting LLMs when ethically using LLMs in academic writing.

  • Research Article
  • Cite Count Icon 61
  • 10.1016/j.cose.2024.103964
From COBIT to ISO 42001: Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models
  • Jun 22, 2024
  • Computers & Security
  • Timothy R Mcintosh + 7 more

From COBIT to ISO 42001: Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models

  • Research Article
  • 10.2196/80289
Nurses’ Perspectives on Evidence Dissemination Barriers and Large Language Model–Based Support: Qualitative Study Using Focus Groups and Nominal Group Technique
  • Nov 7, 2025
  • Journal of Medical Internet Research
  • Junyi Ruan + 4 more

BackgroundCurrent evidence dissemination methods fall short of meeting clinical nurses’ needs, hindering the implementation of evidence-based nursing practice. Large language models (LLMs), with their advanced natural language processing capabilities, offer potential as innovative tools to facilitate evidence dissemination. However, general-purpose LLMs typically lack domain-specific knowledge, are insufficient to support effective evidence dissemination in clinical contexts. It is essential to develop artificial intelligence tools tailored to nurses’ needs and preferences to enhance evidence dissemination.ObjectiveThe aim of this study is to identify the challenges and barriers clinical nurses face in disseminating evidence, examine their perspectives on the use of existing LLMs to support evidence dissemination, and explore their needs and preferences regarding an LLM-based nursing evidence question-answering system.MethodsThis qualitative study used a combined method of focus group discussions and the nominal group technique (NGT). Using purposive sampling, nurses with diverse specialties, professional titles, and years of experience were recruited, resulting in a total of 22 clinical nurses who completed the entire study. A total of 2 focus group discussions were conducted online via Tencent Meeting between November and December 2024 to explore the challenges and barriers nurses face in disseminating evidence, as well as their perspectives on using existing LLMs to support evidence dissemination. The data were analyzed using qualitative content analysis following the approach of Graneheim and Lundman. Subsequently, the NGT was used between March and April 2025 to identify nurses’ needs and preferences for the system to be developed. To overcome geographical constraints and participants’ busy schedules, the NGT was conducted entirely online, using online questionnaires and WeChat groups. Overall, 2 rounds of voting were conducted to determine the priority ranking of the functionalities.ResultsThe focus group yielded 3 main themes and 7 subthemes. Three main themes were identified as (1) pathways for evidence dissemination among nurses, (2) barriers that hinder the effective dissemination of evidence, and (3) advantages and limitations of using LLMs to support evidence dissemination. The limitations of current LLMs served as the foundation for nurses’ subsequent reflections in the nominal group discussions on the desired functions of a newly developed LLM. The NGT sessions ultimately identified 9 desired functions. After prioritization, the top 3 ranked functions were evidence-based, high-quality question-answering, evidence source provision, and personalized evidence recommendation.ConclusionsThe current evidence dissemination process faces multiple barriers. LLMs hold promise as innovative tools to support evidence dissemination, but require further refinement. Clinical nurses have identified key functional needs, guiding the development of LLMs specifically tailored to clinical nursing practice.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant