Provably Safe Artificial General Intelligence via Interactive Proofs

Kristen Carlson

doi:10.3390/philosophies6040083

Abstract

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.

Highlights

A singular and potentially deadly interaction will occur in the transition of technological dominance from H. sapiens to artificial general intelligence (AGI), presenting an existential threat to humanity [1,2,3,4,5,6,7,8,9]
In the succession of AGI generations, each more powerful than the prior generation, the prior generation will be at an existential disadvantage to the succeeding one unless its safety is secured via the decentralized autonomous organization (DAO) and AGI architecture
Interactive proof systems (IPS) used for proving AGI safety is a different paradigm in that randomness ensures the AGI Prover cannot exploit some bias in the series of queries presented by the Verifier

Summary

Introduction

A singular and potentially deadly interaction will occur in the transition of technological dominance from H. sapiens to artificial general intelligence (AGI), presenting an existential threat to humanity [1,2,3,4,5,6,7,8,9]. Through recursive self-improvement, the evolution of AGI generations could occur in brief intervals, perhaps days or hours—a ‘hard take-off’ too fast for human intervention [3,11,12]. This threat necessitates preparing automatic structured transactions—‘smart contracts’—and a variety of other measures stored via distributed ledger technology (blockchains) to eliminate untrustworthy intermediaries and reduce hackability to acceptably low odds [10]. The set of these smart contracts constitutes the foundation documents of an AGI-based decentralized autonomous organization (DAO)—the AGI government. Humans with AI assistance will design the first DAO government, and each AGI generation will design the successive DAO government, negotiated with the successor generation

Intrinsic and Extrinsic AGI Control Systems

Preserving Safety and Control Transitively across AGI Generations

Lack of Proof of Safe AGI or Methods to Prove Safe AGI

The Fundamental Problem of Asymmetric Technological Ability

Interactive Proof Systems Solve the General Technological Asymmetry Problem

The Extreme Generality of Interactive Proof Systems

Correct Interpretation of the Probability of the Proof

Epistemology

Properties of Interactive Proof Systems

11. Applying IPS to Proving Safe AGI

11.2. Program-Checking via Graph Nonisomorphism

11.3. Axiomatic System Representations

11.4. Checking for Ethical or Moral Behavior

11.5. BPP Method 1

11.6. BPP Method 2

11.8. BPP Method 5

11.9. BPP Method 6: A SAT Representation of Behavior Control

13. If ‘Safety’ Can Never Be Described Precisely or Perilous Paths Are Overlooked

14. Securing Ethics Modules via Distributed Ledger Technology

15. Interactive Proof Procedure with Multiple Provers in the Sandbox

16. Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Provably Safe Artificial General Intelligence via Interactive Proofs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Philosophies

Lead the way for us

Journal: Philosophies	Publication Date: Oct 7, 2021
License type: CC BY 4.0

Similar Papers

ChatGPT Isn't Magic
Tama Leaver ... Suzanne Srdarov
M/C Journal | VOL. 26
Tama Leaver, et. al.Tama Leaver ... Suzanne Srdarov
02 Oct 2023
M/C Journal | VOL. 26

On the power of interaction
William Aiellot ... Shafi Goldwasser
-
William Aiellot, et. al.William Aiellot ... Shafi Goldwasser
01 Oct 1986
01 Oct 1986

Response to M. Trengove & coll regarding "Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine".
Stefan Harrer
eBioMedicine | VOL. 93
Stefan HarrerStefan Harrer
01 Jul 2023
eBioMedicine | VOL. 93

Quantum interactive proofs with weak error bounds
Tsuyoshi Ito ... Hirotada Kobayashi
-
Tsuyoshi Ito, et. al.Tsuyoshi Ito ... Hirotada Kobayashi
08 Jan 2012
08 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Provably Safe Artificial General Intelligence via Interactive Proofs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Philosophies