Doku-Assist: Proactive Knowledge Retrieval for Service-Desk Agents: A Feasibility-Study on On-Premise LLMs for Data Privacy and Compliance

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In medium-sized organizations, frequent turnover of first-level support agents can lead to challenges for new agents, who struggle to discover existing and relevant documentation that would help solve user issues due to inexperience. Consequently, these agents escalate tickets to second-level support professionals, increasing their workload. A proactive knowledge discovery and assistance system targeting first-level service desk agents could help by analyzing tickets using a large language model (LLM) and then finding and presenting relevant documentation utilizing Retrieval-Augmented Generation (RAG) techniques. However, when working with cloud-based LLMs on inference tasks involving sensitive information, data sovereignty is compromised, and there is a risk of confidential content from tickets being leaked, as local information is transmitted to the cloud for processing. To address this issue, we constructed a system based on local LLMs so that the operation of the system does not compromise the privacy and confidentiality of ticket content and wiki documentation, keeping all sensitive data on-premise and secure. Our system, Doku-Assist, proactively finds and presents documentation to first-level support agents, thereby assisting with issue resolution without replacing the human agent. It integrates a DokuWiki-derived knowledge base with the ticket system Znuny. For the evaluation of our system, we used artificial tickets, documentation, and customer issues (to address privacy concerns) based on real-world experience. A second-level support agent was tasked with assessing the utility of the developed user interface as well as the documents proactively discovered and presented, concluding that the found documents presented by the Doku-Assist are useful to proactively fill the knowledge gap of new first-level service desk agents. We conclude that data privacy and law compliance can be achieved by utilizing local LLMs.

Similar Papers
  • Conference Article
  • 10.1145/3711875.3729128
CrossLM: A Data-Free Collaborative Fine-Tuning Framework for Large and Small Language Models
  • Jun 23, 2025
  • Yongheng Deng + 5 more

While large language models (LLMs) are endowed with broad knowledge, their task-specific performance is often suboptimal. Fine-tuning LLMs with task-specific data from diverse nodes is necessary, but this data is typically safeguarded and not shared publicly due to privacy concerns. A common solution involves downstream nodes downloading the LLM locally and fine-tuning it with their proprietary data. However, owners often regard pre-trained LLMs as valuable assets and are reluctant to share them. Additionally, the significant computational resources required by LLMs make local fine-tuning impractical for many nodes. To mitigate these problems, this paper proposes CrossLM, a data-free collaborative fine-tuning framework for large and small language models. CrossLM enables resource-constrained nodes to train smaller language models (SLMs) using their private task-specific data. These SLMs are subsequently leveraged to promote the task-specific natural language generation and understanding capabilities of the LLMs. Simultaneously, the SLMs of nodes also benefit from enhancement by the fine-tuned LLMs. In this way, CrossLM avoids sharing private data and proprietary LLMs, and also reduces the resource requirements of nodes. Through extensive experiments across a range of benchmark tasks and popular language models, we demonstrate that CrossLM significantly boosts the task-specific performance of both LLMs and SLMs while preserving the generalization capabilities of LLMs.

  • Research Article
  • 10.55041/ijsrem36608
Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse
  • Aug 10, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Mr Aarush Varma + 1 more

This research paper delves into the inherent vulnerabilities and potential threats posed by large language models (LLMs), focusing on their implications across diverse applications such as natural language processing and data privacy. The study aims to identify and analyze these risks comprehensively, emphasizing the importance of mitigating strategies to prevent exploitation and misuse in LLM deployments. In recent years, LLMs have revolutionized fields like automated content generation, sentiment analysis, and conversational agents, yet their immense capabilities also raise significant security concerns. Vulnerabilities such as bias amplification, adversarial attacks, and unintended data leakage can undermine trust and compromise user privacy. Through a systematic examination of these challenges, this paper proposes safeguarding measures crucial for responsibly harnessing the potential of LLMs while minimizing associated risks. It underscores the necessity of rigorous security protocols, including robust encryption methods, enhanced authentication mechanisms, and continuous monitoring frameworks. Furthermore, the research discusses regulatory implications and ethical considerations surrounding LLM usage, advocating for transparency, accountability, and stakeholder engagement in policy- making and deployment practices. By synthesizing insights from current literature and real-world case studies, this study provides a comprehensive framework for stakeholders—developers, policymakers, and users—to navigate the complex landscape of LLM security effectively. Ultimately, this research aims to inform future advancements in LLM technology, ensuring its safe and beneficial integration into various domains while mitigating potential risks to individuals and society as a whole. Keywords— Adversarial attacks on LLMs, Bias in LLMs, Data privacy in LLMs, Ethical considerations LLMs, Exploitation of LLMs, Large Language Models (LLMs), Misuse of LLMs, Mitigation strategies for LLMs, Natural Language Processing (NLP), Regulatory frameworks LLMs, Responsible deployment of LLMs, Risks of LLMs, Security implications of LLMs, Threats to LLMs, Vulnerabilities in LLMs.

  • Front Matter
  • Cite Count Icon 1
  • 10.3389/frai.2024.1516832
Editorial: Large language models in work and business.
  • Nov 29, 2024
  • Frontiers in artificial intelligence
  • Şadi Evren Şeker

In today’s rapidly evolving business landscape, Artificial Intelligence (AI), and specifically Large Language Models (LLMs), are redefining how organizations operate, make decisions, and engage with customers. AI-driven technologies have become indispensable, providing businesses with powerful tools to streamline operations, derive actionable insights from vast data, and foster more meaningful customer interactions. For business leaders, scholars, and practitioners alike, understanding the transformative potential of AI isn’t just advantageous—it’s essential to staying competitive in an increasingly data-driven world.This editorial delves into recent scholarly advancements in LLM applications within business contexts, analyzing studies that explore AI’s potential across various domains, from decision support to creative industries. By introducing a structured framework, this editorial highlights key insights and contributions from recent studies, assessing their value to academia and industry. The following comparative analysis sheds light on how these innovations shape our understanding of AI’s role in business while pointing to future research directions.Puyt and Madsen's (2024) study stands out as a foundational exploration of LLM accuracy, assessing ChatGPT-4's ability to recount the history of the SWOT analysis-a vital business strategy tool. Their findings reveal that, while ChatGPT-4 effectively conveys general concepts, it struggles with detailed historical information, often producing inaccuracies or "hallucinations." This gap underscores the need for LLMs to be trained with verified academic data, particularly for strategic business applications that demand precision. This study not only contributes to the literature by proposing methods to evaluate AI accuracy in historical contexts but also highlights the importance of rigorous information vetting in industry settings where reliability is crucial.In contrast, Raikov et al. (2024) explore a hybrid intelligence model that combines LLM capabilities with explainable AI (XAI) principles to enhance human-machine collaboration. Their approach emphasizes cognitive semantics, improving transparency and decision-making efficiency. The hybrid model's real-time adaptability addresses the needs of complex, regulated industries such as finance and healthcare, where trust in AI decisions is paramount. Academically, this study provides a valuable addition to XAI literature by demonstrating how LLMs can bridge the gap between AI autonomy and human oversight, making it a model for future human-AI interactions in complex business environments.Another significant study by Mariotti and colleagues (2024) examines the integration of LLMs with enterprise knowledge graphs to enhance data-driven decision-making. By enabling organizations to leverage knowledge graphs for more accurate and scalable data retrieval, this research provides a robust framework for businesses seeking efficient knowledge management systems. The academic contribution here lies in advancing the dialogue between LLMs and knowledge graphs, emphasizing ethical data handling and quality standards essential for industry applications. For enterprises, the study offers practical solutions to achieve streamlined data management, balancing automation with privacy and security. 2024) take a different approach, investigating LLMs' role in creative industries, specifically within fashion design. They introduce a hybrid intelligence model that supports creative processes, allowing AI to complement rather than replace human ingenuity. While LLMs in this field demonstrate potential in automating repetitive design tasks and enhancing customer personalization, the study reveals limitations in AI's ability to handle spatial and stylistic nuances. This study's academic contribution lies in promoting human-AI co-creation, inspiring further research into AI applications across diverse creative sectors, including media and marketing.Collectively, these studies not only illuminate LLMs' transformative potential in business but also highlight critical ethical and operational considerations. Ensuring accuracy, transparency, and data privacy are vital to responsibly integrating AI into business workflows. Future research should focus on enhancing LLM accuracy, refining hybrid intelligence models, and exploring creative AI applications, all while maintaining ethical standards. As LLMs evolve, interdisciplinary collaborations will be essential to harness their full potential, making AI an ethical, effective, and innovative force in the business world.

  • Research Article
  • Cite Count Icon 794
  • 10.1016/j.hcc.2024.100211
A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly
  • Mar 1, 2024
  • High-Confidence Computing
  • Yifan Yao + 5 more

A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3390/cancers16162830
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.
  • Aug 12, 2024
  • Cancers
  • Oskitz Ruiz Sarrias + 15 more

Large Language Models (LLMs), such as the GPT model family from OpenAI, have demonstrated transformative potential across various fields, especially in medicine. These models can understand and generate contextual text, adapting to new tasks without specific training. This versatility can revolutionize clinical practices by enhancing documentation, patient interaction, and decision-making processes. In oncology, LLMs offer the potential to significantly improve patient care through the continuous monitoring of chemotherapy-induced toxicities, which is a task that is often unmanageable for human resources alone. However, existing research has not sufficiently explored the accuracy of LLMs in identifying and assessing subjective toxicities based on patient descriptions. This study aims to fill this gap by evaluating the ability of LLMs to accurately classify these toxicities, facilitating personalized and continuous patient care. This comparative pilot study assessed the ability of an LLM to classify subjective toxicities from chemotherapy. Thirteen oncologists evaluated 30 fictitious cases created using expert knowledge and OpenAI's GPT-4. These evaluations, based on the CTCAE v.5 criteria, were compared to those of a contextualized LLM model. Metrics such as mode and mean of responses were used to gauge consensus. The accuracy of the LLM was analyzed in both general and specific toxicity categories, considering types of errors and false alarms. The study's results are intended to justify further research involving real patients. The study revealed significant variability in oncologists' evaluations due to the lack of interaction with fictitious patients. The LLM model achieved an accuracy of 85.7% in general categories and 64.6% in specific categories using mean evaluations with mild errors at 96.4% and severe errors at 3.6%. False alarms occurred in 3% of cases. When comparing the LLM's performance to that of expert oncologists, individual accuracy ranged from 66.7% to 89.2% for general categories and 57.0% to 76.0% for specific categories. The 95% confidence intervals for the median accuracy of oncologists were 81.9% to 86.9% for general categories and 67.6% to 75.6% for specific categories. These benchmarks highlight the LLM's potential to achieve expert-level performance in classifying chemotherapy-induced toxicities. The findings demonstrate that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM achieved 85.7% accuracy in general categories and 64.6% in specific categories. While the model's general category performance falls within expert ranges, specific category accuracy requires improvement. The study's limitations include the use of fictitious cases, lack of patient interaction, and reliance on audio transcriptions. Nevertheless, LLMs show significant potential for enhancing patient monitoring and reducing oncologists' workload. Future research should focus on the specific training of LLMs for medical tasks, conducting studies with real patients, implementing interactive evaluations, expanding sample sizes, and ensuring robustness and generalization in diverse clinical settings. This study concludes that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM's performance in general toxicity categories is within the expert range, but there is room for improvement in specific categories. LLMs have the potential to enhance patient monitoring, enable early interventions, and reduce severe complications, improving care quality and efficiency. Future research should involve specific training of LLMs, validation with real patients, and the incorporation of interactive capabilities for real-time patient interactions. Ethical considerations, including data accuracy, transparency, and privacy, are crucial for the safe integration of LLMs into clinical practice.

  • Preprint Article
  • 10.2196/preprints.71916
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline (Preprint)
  • Jan 29, 2025
  • Hongyi Li + 2 more

BACKGROUND Large language models (LLMs) can generate outputs understandable by humans, such as answers to medical questions and radiology reports. With the rapid development of LLMs, clinicians face a growing challenge in determining the most suitable algorithms to support their work. OBJECTIVE We aimed to provide clinicians and other health care practitioners with systematic guidance in selecting an LLM that is relevant and appropriate to their needs and facilitate the integration process of LLMs in health care. METHODS We conducted a literature search of full-text publications in English on clinical applications of LLMs published between January 1, 2022, and March 31, 2025, on PubMed, ScienceDirect, Scopus, and IEEE Xplore. We excluded papers from journals below a set citation threshold, as well as papers that did not focus on LLMs, were not research based, or did not involve clinical applications. We also conducted a literature search on arXiv within the same investigated period and included papers on the clinical applications of innovative multimodal LLMs. This led to a total of 270 studies. RESULTS We collected 330 LLMs and recorded their application frequency in clinical tasks and frequency of best performance in their context. On the basis of a 5-stage clinical workflow, we found that stages 2, 3, and 4 are key stages in the clinical workflow, involving numerous clinical subtasks and LLMs. However, the diversity of LLMs that may perform optimally in each context remains limited. GPT-3.5 and GPT-4 were the most versatile models in the 5-stage clinical workflow, applied to 52% (29/56) and 71% (40/56) of the clinical subtasks, respectively, and they performed best in 29% (16/56) and 54% (30/56) of the clinical subtasks, respectively. General-purpose LLMs may not perform well in specialized areas as they often require lightweight prompt engineering methods or fine-tuning techniques based on specific datasets to improve model performance. Most LLMs with multimodal abilities are closed-source models and, therefore, lack of transparency, model customization, and fine-tuning for specific clinical tasks and may also pose challenges regarding data protection and privacy, which are common requirements in clinical settings. CONCLUSIONS In this review, we found that LLMs may help clinicians in a variety of clinical tasks. However, we did not find evidence of generalist clinical LLMs successfully applicable to a wide range of clinical tasks. Therefore, their clinical deployment remains challenging. On the basis of this review, we propose an interactive online guideline for clinicians to select suitable LLMs by clinical task. With a clinical perspective and free of unnecessary technical jargon, this guideline may be used as a reference to successfully apply LLMs in clinical settings.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 29
  • 10.1108/jebde-08-2023-0015
Unraveling the landscape of large language models: a systematic review and future perspectives
  • Dec 19, 2023
  • Journal of Electronic Business & Digital Economics
  • Qinxu Ding + 4 more

PurposeThe rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.Design/methodology/approachDrawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.FindingsThe analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.Practical implicationsThis classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.Originality/valueThis study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.hcc.2025.100300
On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review
  • Jun 1, 2025
  • High-Confidence Computing
  • Biwei Yan + 6 more

On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review

  • Conference Article
  • Cite Count Icon 1
  • 10.54941/ahfe1006669
Enhancing Thematic Analysis with Local LLMs: A Scientific Evaluation of Prompt Engineering Techniques
  • Jan 1, 2025
  • AHFE international
  • Timothy Meyer + 2 more

Thematic Analysis (TA) is a powerful tool for human factors, HCI, and UX researchers to gather system usability insights from qualitative data like open-ended survey questions. However, TA is both time consuming and difficult, requiring researchers to review and compare hundreds, thousands, or even millions of pieces of text. Recently, this has driven many to explore using Large Language Models (LLMs) to support such an analysis. However, LLMs have their own processing limitations and usability challenges when implementing them reliably as part of a research process – especially when working with a large corpus of data that exceeds LLM context windows. These challenges are compounded when using locally hosted LLMs, which may be necessary to analyze sensitive and/or proprietary data. However, little human factors research has rigorously examined how various prompt engineering techniques can augment an LLM to overcome these limitations and improve usability. Accordingly, in the present paper, we investigate the impact of several prompt engineering techniques on the quality of LLM-mediated TA. Using a local LLM (Llama 3.1 8b) to ensure data privacy, we developed four LLM variants with progressively complex prompt engineering techniques and used them to extract themes from user feedback regarding the usability of a novel knowledge management system prototype. The LLM variants were as follows:1.A “baseline” variant with no prompt engineering or scalability2.A “naïve batch processing” variant that sequentially analyzed small batches of the user feedback to generate a single list of themes3.An “advanced batch processing” variant that built upon the naïve variant by adding prompt engineering techniques (e.g., chain-of-thought prompting)4.A “cognition-inspired” variant that incorporated advanced prompt engineering techniques and kept a working memory-like log of themes and their frequencyContrary to conventional approaches to studying LLMs, which largely rely upon descriptive statistics (e.g., % improvement), we systematically applied a set of evaluation methods from behavioral science and human factors. We performed three stages of evaluation of the outputs of each LLM variant: we compared the LLM outputs to our team’s original TA, we had human factors professionals (N = 4) rate the quality and usefulness of the outputs, and we compared the Inter-Rater Reliability (IRR) of other human factors professionals (N = 2) attempting to code the original data with the outputs generated by each variant. Results demonstrate that even small, locally deployed LLMs can produce high-quality TA when guided by appropriate prompts. While the “baseline” variant performed surprisingly well for small datasets, we found that the other, scalable methods were dependent upon advanced prompt engineering techniques to be successful. Only our novel "cognition-inspired" approach performed as well as the “baseline” variant in qualitative and quantitative comparisons of ratings and coding IRR. This research provides practical guidance for human factors researchers looking to integrate LLMs into their qualitative analysis workflows, disentangling and uncovering the importance of context window limitations, batch processing strategies, and advanced prompt engineering techniques. The findings suggest that local LLMs can serve as valuable and scalable tools in thematic analysis.

  • Research Article
  • Cite Count Icon 1
  • 10.1108/mlag-01-2025-0001
Large language models for automated grading in geotechnics
  • Nov 28, 2025
  • Machine Learning and Data Science in Geotechnics
  • Enrico Soranzo

Purpose The purpose of this study is to explore the application of automated grading systems in geotechnics using large language models (LLMs) and cosine similarity for enhanced assessment and educational content generation. By training and testing LLMs on synthetic and real student data, the study seeks to develop robust systems for grading technical reports and open-ended questions, aligned with industry standards. Additionally, it aims to enhance student learning through auto-grading, immediate feedback and content generation, while addressing ethical considerations such as data privacy and fairness. Ultimately, the study strives to demonstrate the potential of LLMs to improve consistency, efficiency and educational outcomes. Design/methodology/approach The study employs a mixed-methods approach to develop and validate automated grading systems in geotechnics. Initially, correct answers were generated manually and synthetically using a generative pre-trained transformer model, with synthetic answers compared to correct ones via cosine similarity. Real student answers underwent similar evaluation. A Web-based tool was created to assess responses in real-time, providing dynamic feedback. Additionally, LLMs were fine-tuned on geotechnics textbooks and validated using synthetic and real student data. Anonymized student project reports were graded automatically, showcasing the potential and limitations of LLMs in consistent grading and educational content generation. Ethical considerations were addressed throughout. Findings The study demonstrated the potential of LLMs in geotechnics education by developing ML-driven systems for grading and content generation. The grading system, using cosine similarity and LLMs, provided consistent and objective assessments comparable to human graders. Immediate feedback on open-ended questions enhanced learning outcomes, enabling students to address knowledge gaps effectively. Fine-tuning LLMs with geotechnics textbooks and industry standards facilitated the generation of accurate, relevant questions and answers, further improved by retrieval-augmented generation (RAG). Data augmentation techniques enhanced model robustness, while ethical considerations, including data privacy, fairness, and transparency, ensured responsible deployment and fostered trust among stakeholders. Originality/value This study offers originality and value by pioneering the application of LLMs and cosine similarity for automated grading in geotechnics education, a domain with limited exploration in educational technology. By integrating RAG and fine-tuning LLMs with domain-specific textbooks, it bridges the gap between advanced machine learning techniques and practical applications in engineering education. The development of real-time feedback tools and robust grading systems enhances both student learning and instructional efficiency. Furthermore, addressing ethical considerations such as fairness and data privacy sets a precedent for responsible artificial intelligence (AI) deployment, contributing to the broader adoption of AI in academia.

  • Research Article
  • Cite Count Icon 202
  • 10.1186/s41073-023-00133-5
Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review
  • May 18, 2023
  • Research Integrity and Peer Review
  • Mohammad Hosseini + 1 more

BackgroundThe emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks.MethodsTo investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers’ role, 2) editors’ role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT’s performance regarding identified issues.ResultsLLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs’ training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing.ConclusionsWe believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports’ accuracy, tone, reasoning and originality.

  • Research Article
  • Cite Count Icon 15
  • 10.1093/bjrai/ubae019
Large language models in cancer: potentials, risks, and safeguards.
  • Dec 20, 2024
  • BJR artificial intelligence
  • Md Muntasir Zitu + 8 more

This review examines the use of large language models (LLMs) in cancer, analysing articles sourced from PubMed, Embase, and Ovid Medline, published between 2017 and 2024. Our search strategy included terms related to LLMs, cancer research, risks, safeguards, and ethical issues, focusing on studies that utilized text-based data. 59 articles were included in the review, categorized into 3 segments: quantitative studies on LLMs, chatbot-focused studies, and qualitative discussions on LLMs on cancer. Quantitative studies highlight LLMs' advanced capabilities in natural language processing (NLP), while chatbot-focused articles demonstrate their potential in clinical support and data management. Qualitative research underscores the broader implications of LLMs, including the risks and ethical considerations. Our findings suggest that LLMs, notably ChatGPT, have potential in data analysis, patient interaction, and personalized treatment in cancer care. However, the review identifies critical risks, including data biases and ethical challenges. We emphasize the need for regulatory oversight, targeted model development, and continuous evaluation. In conclusion, integrating LLMs in cancer research offers promising prospects but necessitates a balanced approach focusing on accuracy, ethical integrity, and data privacy. This review underscores the need for further study, encouraging responsible exploration and application of artificial intelligence in oncology.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 289
  • 10.1038/s41368-023-00239-y
ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model
  • Jul 28, 2023
  • International Journal of Oral Science
  • Hanyao Huang + 10 more

The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.rcim.2025.103154
Human–robot collaborative visual inspection with Large Language Models
  • Apr 1, 2026
  • Robotics and Computer-Integrated Manufacturing
  • Osama Tasneem + 1 more

Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: https://github.com/CuriousLad1000/RoboSpection . • A system that enables natural language interaction for human–robot collaboration • Locally hosted LLM that generates task-specific robot code from speech • Hybrid architecture that combines high-level LLM planning with low-level ROS execution • Evaluation of the system using real-life industrial visual inspection use case world environments, across varying levels of complexity.

  • Research Article
  • Cite Count Icon 38
  • 10.1145/3682068
Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning
  • Mar 13, 2025
  • ACM Transactions on Management Information Systems
  • Xiao-Yang Liu + 6 more

The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: (1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and (2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant