Measuring Human-AI Value Alignment in Large Language Models

Hakim Norhashim,Jungpil Hahn

doi:10.1609/aies.v7i1.31703

Abstract

This paper seeks to quantify the human-AI value alignment in large language models. Alignment between humans and AI has become a critical area of research to mitigate potential harm posed by AI. In tandem with this need, developers have incorporated a values-based approach towards model development where ethical principles are integrated from its inception. However, ensuring that these values are reflected in outputs remains a challenge. In addition, studies have noted that models lack consistency when producing outputs, which in turn can affect their function. Such variability in responses would impact human-AI value alignment as well, particularly where consistent alignment is critical. Fundamentally, the task of uncovering a model’s alignment is one of explainability – where understanding how these complex models behave is essential in order to assess their alignment. This paper examines the problem through a case study of GPT-3.5. By repeatedly prompting the model with scenarios based on a dataset of moral stories, we aggregate the model’s alignment with human values to produce a human-AI value alignment metric. Moreover, by using a comprehensive taxonomy of human values, we uncover the latent value profile represented by these outputs, thereby determining the extent of human-AI value alignment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measuring Human-AI Value Alignment in Large Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... W Nick Street
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... W Nick Street
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

Evaluating Artificial Intelligence Chatbots in Oral and Maxillofacial Surgery Board Exams: Performance and Potential
Reema Mahmoud ... Oren Peleg
Journal of Oral and Maxillofacial Surgery | VOL. -
Reema Mahmoud, et. al.Reema Mahmoud ... Oren Peleg
01 Nov 2024
Journal of Oral and Maxillofacial Surgery | VOL. -

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.
Michael S Deiner ... Urmimala Sarkar
JMIR infodemiology | VOL. 4
Michael S Deiner, et. al.Michael S Deiner ... Urmimala Sarkar
29 Aug 2024
JMIR infodemiology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring Human-AI Value Alignment in Large Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society