Abstract. Large language models (LLMs), such as ChatGPT, have become essential tools due to their advanced natural language processing capabilities. However, these models, trained on extensive internet text, can inadvertently learn and propagate unwanted biases, impacting their outputs. This study addresses this issue by analyzing and mitigating such biases through a multi-task and multi-stage training approach. Utilizing the Winograd Bias (Winobias) dataset, the research fine-tunes the Bidirectional Encoder Representations from Transformers (BERT) model to reduce biased outputs. The approach includes an initial mask task to establish a general understanding and a subsequent cloze task to specifically target and mitigate biases. Results demonstrate a significant reduction in bias, with the original model showing approximately 90% certainty in biased outputs, whereas the de-biased model reduced this certainty to 55%. This study effectively showcases a method for bias reduction by modifying only a few parameters, emphasizing a practical approach to enhancing fairness and balance in LLMs used across various applications.
Read full abstract