Abstract

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.

Highlights

  • A singular and potentially deadly interaction will occur in the transition of technological dominance from H. sapiens to artificial general intelligence (AGI), presenting an existential threat to humanity [1,2,3,4,5,6,7,8,9]

  • In the succession of AGI generations, each more powerful than the prior generation, the prior generation will be at an existential disadvantage to the succeeding one unless its safety is secured via the decentralized autonomous organization (DAO) and AGI architecture

  • Interactive proof systems (IPS) used for proving AGI safety is a different paradigm in that randomness ensures the AGI Prover cannot exploit some bias in the series of queries presented by the Verifier

Read more

Summary

Introduction

A singular and potentially deadly interaction will occur in the transition of technological dominance from H. sapiens to artificial general intelligence (AGI), presenting an existential threat to humanity [1,2,3,4,5,6,7,8,9]. Through recursive self-improvement, the evolution of AGI generations could occur in brief intervals, perhaps days or hours—a ‘hard take-off’ too fast for human intervention [3,11,12]. This threat necessitates preparing automatic structured transactions—‘smart contracts’—and a variety of other measures stored via distributed ledger technology (blockchains) to eliminate untrustworthy intermediaries and reduce hackability to acceptably low odds [10]. The set of these smart contracts constitutes the foundation documents of an AGI-based decentralized autonomous organization (DAO)—the AGI government. Humans with AI assistance will design the first DAO government, and each AGI generation will design the successive DAO government, negotiated with the successor generation

Intrinsic and Extrinsic AGI Control Systems
Preserving Safety and Control Transitively across AGI Generations
Lack of Proof of Safe AGI or Methods to Prove Safe AGI
The Fundamental Problem of Asymmetric Technological Ability
Interactive Proof Systems Solve the General Technological Asymmetry Problem
The Extreme Generality of Interactive Proof Systems
Correct Interpretation of the Probability of the Proof
Epistemology
Properties of Interactive Proof Systems
11. Applying IPS to Proving Safe AGI
11.2. Program-Checking via Graph Nonisomorphism
11.3. Axiomatic System Representations
11.4. Checking for Ethical or Moral Behavior
11.5. BPP Method 1
11.6. BPP Method 2
11.8. BPP Method 5
11.9. BPP Method 6: A SAT Representation of Behavior Control
13. If ‘Safety’ Can Never Be Described Precisely or Perilous Paths Are Overlooked
14. Securing Ethics Modules via Distributed Ledger Technology
15. Interactive Proof Procedure with Multiple Provers in the Sandbox
16. Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.