Privacy preserving large language models: ChatGPT case study based vision and framework

Imdad Ullah,Najm Hassan,Sukhpal Singh Gill,Basem Suleiman,Tariq Ahamed Ahanger,Zawar Shah,Junaid Qadir,Salil S Kanhere

doi:10.1049/blc2.12091

Abstract

AbstractThe generative Artificial Intelligence (AI) tools based on Large Language Models (LLMs) use billions of parameters to extensively analyse large datasets and extract critical information such as context, specific details, identifying information, use this information in the training process, and generate responses for the requested queries. The extracted data also contain sensitive information, seriously threatening user privacy and reluctance to use such tools. This article proposes the conceptual model called PrivChatGPT, a privacy‐preserving model for LLMs consisting of two main components, that is, preserving user privacy during the data curation/pre‐processing and preserving private context and the private training process for large‐scale data. To demonstrate the applicability of PrivChatGPT, it is shown how a private mechanism could be integrated into the existing model for training LLMs to protect user privacy; specifically, differential privacy and private training using Reinforcement Learning (RL) were employed. The privacy level probabilities are associated with the document contents, including the private contextual information, and with metadata, which is used to evaluate the disclosure probability loss for an individual's private information. The privacy loss is measured and the measure of uncertainty or randomness is evaluated using entropy once differential privacy is applied. It recursively evaluates the level of privacy guarantees and the uncertainty of public databases and resources during each update when new information is added for training purposes. To critically evaluate the use of differential privacy for private LLMs, other mechanisms were hypothetically compared such as Blockchain, private information retrieval, randomisation, obfuscation, anonymisation, and the use of Tor for various performance measures such as the model performance and accuracy, computational complexity, privacy vs. utility, training latency, vulnerability to attacks, and resource consumption. It is concluded that differential privacy, randomisation, and obfuscation can impact the training models' utility and performance; conversely, using Tor, Blockchain, and Private Information Retrieval (PIR) may introduce additional computational complexity and high training latency. It is believed that the proposed model could be used as a benchmark for privacy‐preserving LLMs for generative AI tools.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Privacy preserving large language models: ChatGPT case study based vision and framework

Abstract

Talk to us

Similar Papers

More From: IET Blockchain

Lead the way for us

Journal: IET Blockchain	Publication Date: Nov 17, 2024
License type: CC BY 4.0

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

ChatGPT Isn't Magic
Tama Leaver ... Suzanne Srdarov
M/C Journal | VOL. 26
Tama Leaver, et. al.Tama Leaver ... Suzanne Srdarov
02 Oct 2023
M/C Journal | VOL. 26

The rise of artificial intelligence: addressing the impact of large language models such as ChatGPT on scientific publications.
Tiing Leong Ang ... Kay Choong See
Singapore Medical Journal | VOL. 64
Tiing Leong Ang, et. al.Tiing Leong Ang ... Kay Choong See
30 Mar 2023
Singapore Medical Journal | VOL. 64

Response to M. Trengove & coll regarding "Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine".
Stefan Harrer
eBioMedicine | VOL. 93
Stefan HarrerStefan Harrer
01 Jul 2023
eBioMedicine | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Privacy preserving large language models: ChatGPT case study based vision and framework

Abstract

Talk to us

Similar Papers

More From: IET Blockchain