Abstract

In the last years, artificial intelligence (AI) safety gained international recognition in the light of heterogeneous safety-critical and ethical issues that risk overshadowing the broad beneficial impacts of AI. In this context, the implementation of AI observatory endeavors represents one key research direction. This paper motivates the need for an inherently transdisciplinary AI observatory approach integrating diverse retrospective and counterfactual views. We delineate aims and limitations while providing hands-on-advice utilizing concrete practical examples. Distinguishing between unintentionally and intentionally triggered AI risks with diverse socio-psycho-technological impacts, we exemplify a retrospective descriptive analysis followed by a retrospective counterfactual risk analysis. Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety. As further contribution, we discuss differentiated and tailored long-term directions through the lens of two disparate modern AI safety paradigms. For simplicity, we refer to these two different paradigms with the terms artificial stupidity (AS) and eternal creativity (EC) respectively. While both AS and EC acknowledge the need for a hybrid cognitive-affective approach to AI safety and overlap with regard to many short-term considerations, they differ fundamentally in the nature of multiple envisaged long-term solution patterns. By compiling relevant underlying contradistinctions, we aim to provide future-oriented incentives for constructive dialectics in practical and theoretical AI safety research.

Highlights

  • As can already be realized from the scope of the artificial intelligence (AI) safety guidelines proposed in Section 5.1.1 which are grounded in our AI observatory exemplification of retrospective descriptive analysis (RDA) and retrospective counterfactual risk analysis [14] (RCRA), modern AI technology cannot be analyzed in isolation

  • While the last Section 5.1.1 focused on guidelines concerning the AI risks Ia and Ib related to intentional malice, this Section 5.1.2 is linked to the risks Ic and Id related to mistakes and unintentional failures which are often of ethically-relevant nature

  • Starting with a cybersecurity-oriented fit-for-purpose taxonomy of ethical distinction, we introduced and exemplified a retrospective descriptive analysis (RDA) for future AI observatory projects

Read more

Summary

Motivation

The importance of addressing artificial intelligence (AI) safety, AI ethics and AI governance issues has been acknowledged at an international level across diverse AI research subfields [1,2,3,4,5,6]. We propose a taxonomy-based so-called retrospective counterfactual risk analysis [14] (RCRA). The remainder of the paper is organized as follows— 2, we first introduce a simple fit-for-purpose AI risk taxonomy as basis for classification within RDAs and RCRAs for AI observatory projects.

Simple AI Risk Taxonomy
Aims and Limitations
RDA for AI Risk Instantiations Ia and Ib—Examples
RDA for AI Risk Instantiations Ic and Id—Examples
Preparatory Procedure
Exemplary RDA-Based RCRA for AI Observatory Projects
Downward Counterfactual DF Narrative A0a2
Downward Counterfactual DF Narrative A0a3
Downward Counterfactual DF Narrative A0a4
Downward Counterfactual DF Narrative R0a1
Downward Counterfactual DF Narrative Ea0 1
Downward Counterfactual DF Narrative R0b
Downward Counterfactual DF Narrative Ec0 1
Downward Counterfactual DF Narrative Fd0
Discussion
Near-Term Guidelines for Risks Ia and Ib
Near-Term Guidelines for Risks Ic and Id
Long-Term Directions and Future-Oriented Contradistinctions
RDA Data Collection
Interlinking RDA-Based RCRA Pre-Processing and RCRA DFs
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call