Abstract

We propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. We consider several problems arising in the design process, including training a system to be robust to rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs, then sampling these to generate specialized training and test data. More generally, such languages can be used to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems such as autonomous cars and robots, whose environment at any point in time is a scene, a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes and the behaviors of their agents over time. Scenic combines concise, readable syntax for spatiotemporal relationships with the ability to declaratively impose hard and soft constraints over the scenario. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic’s domain-specific syntax. Finally, we apply Scenic in multiple case studies for training, testing, and debugging neural networks for perception both as standalone components and within the context of a full cyber-physical system.

Highlights

  • Machine learning (ML) is increasingly used in safety-critical applications, thereby creating an acute need for techniques to gain higher assurance in ML-based systems (Russell et al 2015; Seshia et al 2016; Amodei et al 2016)

  • We propose a methodology for training, testing, and debugging ML-based cyber-physical systems using probabilistic programming languages

  • Training the car-detection network on a state-of-the-art synthetic dataset obtained by randomly driving around inside the simulated world of Grand Theft Auto V (GTAV) and capturing images periodically (Johnson-Roberson et al 2017), we find its performance is significantly worse on the overlapping images

Read more

Summary

Introduction

Machine learning (ML) is increasingly used in safety-critical applications, thereby creating an acute need for techniques to gain higher assurance in ML-based systems (Russell et al 2015; Seshia et al 2016; Amodei et al 2016). We could specify a particular model or non-default distribution over models by just adding with model M to the definition of the Car. More interestingly, we could produce a scenario for badly-parked cars by adding two lines: spot = OrientedPoint on visible curb 2 badAngle = Uniform(1.0, -1.0) * Range(10, 20) deg 3 Car left of spot by 0.5, 4 facing badAngle relative to roadDirection. Scenic provides a simple mutation system that improves compositionality by providing a mechanism to add variety to a scenario without changing its code This is useful, for example, if we have a scenario encoding a single concrete scene obtained from real-world data and want to quickly generate variations. We outline Scenic’s support for dynamic scenarios, as well as for composing multiple scenarios together to produce more complex ones

Dynamic scenarios
Compositional scenarios
Data types
Expressions
Specifiers
Statements
Semantics of Scenic
Domain-specific sampling techniques
Experiments
Experimental setup
Testing and falsification
Testing a perception module
Falsifying a dynamic closed-loop system
Training on rare events
Debugging failures
Related work
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call