Abstract

We present a new framework for AI alignment called Contractual AI, and apply it to the setting of dialogue agents chatting with humans. This framework incorporates and builds on previous approaches to alignment, such as Constitutional AI. We propose that fully aligned systems may need both a "think fast" and a "think slow" systems for approximating complex human judgements. Fast thinking (System 1) is computationally cheap but rigid and brittle in novel situations, while slow thinking (System 2) is more expensive but more flexible and robust. System 1 makes judgements by asking whether a rule or principle is violated. System 2 does the explicit reasoning that produces the rules, explicitly tallying costs and benefits for all stakeholders. Rule-based systems like Constitutional AI correspond roughly to System 1. Here, we implement a prototype of System 2, and lay out a road-map for enabling the system to make more thorough and accurate considerations for all stakeholder groups, including those underrepresented in the training data (e.g. racial minorities). For initial testing, we guided the decision process through the steps of: 1) identifying all stakeholders, 2) listing their individual concerns, 3) soliciting the projected opinions of various experts, and 4) combining the expert opinions into a final moral judgement. The resulting text was less generic, more aware of complex stakeholder needs, and ultimately more actionable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.