Abstract

We present a new framework for AI alignment called Contractual AI, and apply it to the setting of dialogue agents chatting with humans. This framework incorporates and builds on previous approaches to alignment, such as Constitutional AI. We propose that fully aligned systems may need both a "think fast" and a "think slow" systems for approximating complex human judgements. Fast thinking (System 1) is computationally cheap but rigid and brittle in novel situations, while slow thinking (System 2) is more expensive but more flexible and robust. System 1 makes judgements by asking whether a rule or principle is violated. System 2 does the explicit reasoning that produces the rules, explicitly tallying costs and benefits for all stakeholders. Rule-based systems like Constitutional AI correspond roughly to System 1. Here, we implement a prototype of System 2, and lay out a road-map for enabling the system to make more thorough and accurate considerations for all stakeholder groups, including those underrepresented in the training data (e.g. racial minorities). For initial testing, we guided the decision process through the steps of: 1) identifying all stakeholders, 2) listing their individual concerns, 3) soliciting the projected opinions of various experts, and 4) combining the expert opinions into a final moral judgement. The resulting text was less generic, more aware of complex stakeholder needs, and ultimately more actionable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call