Engineering AI for provable retention of objectives over time

Adeniyi Fasoro

doi:10.1002/aaai.12167

Abstract

AbstractI argue that ensuring artificial intelligence (AI) retains alignment with human values over time is critical yet understudied. Most research focuses on static alignment, neglecting crucial retention dynamics enabling stability during learning and autonomy. This paper elucidates limitations constraining provable retention, arguing key gaps include formalizing dynamics, transparency of advanced systems, participatory scaling, and risks of uncontrolled recursive self‐improvement. I synthesize technical and ethical perspectives into a conceptual framework grounded in control theory and philosophy to analyze dynamics. I argue priorities should shift towards capability modulation, participatory design, and advanced modeling to verify enduring alignment. Overall, I argue that realizing AI safely aligned throughout its lifetime necessitates translating principles into formal methods, demonstrations, and systems integrating technical and humanistic rigor.

Full Text