Abstract

Work on value alignment aims to ensure that human values are respected by AI systems. However, existing approaches tend to rely on universal framings of human values that obscure the question of which values the systems should capture and align with, given the variety of operational situations. This often results in AI systems that privilege only a selected few while perpetuating problematic norms grounded on biases, ultimately causing equity and justice issues. In this perspective paper, we unpack the limitations of predominant alignment practices of reinforcement learning from human feedback (RLHF) for LLMs through the lens of situated values. We build on feminist epistemology to argue that at the design-time, RLHF has problems with representation in the subjects providing feedback and implicitness in the conceptualization of values and situations of real-world users while lacking system adaptation to real user situations at the use time. To address these shortcomings, we propose three research directions: 1) situated annotation to capture information about the crowdworker’s and user’s values and judgments in relation to specific situations at both the design and use-time, 2) expressive instruction to encode plural values for instructing LLMs systems at design-time, and 3) reflexive adaptation to leverage situational knowledge for system adaption at use-time. We conclude by reflecting on the practical challenges of pursuing these research directions and situated value alignment of AI more broadly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.