A Myhill-Nerode theorem for register automata and symbolic trace languages

Frits Vaandrager,Abhisek Midya

doi:10.1016/j.tcs.2022.01.015

Abstract

We propose a new symbolic trace semantics for register automata (extended finite state machines) which records both the sequence of input symbols that occur during a run as well as the constraints on input parameters that are imposed by this run. Our main result is a generalization of the classical Myhill-Nerode theorem to this symbolic setting. Our generalization requires the use of three relations to capture the additional structure of register automata. Location equivalence ≡l captures that symbolic traces end in the same location, transition equivalence ≡t captures that they share the same final transition, and a partial equivalence relation ≡r captures that symbolic values v and v′ are stored in the same register after symbolic traces w and w′, respectively. A symbolic language is defined to be regular if relations ≡l, ≡t and ≡r exist that satisfy certain conditions, in particular, they all have finite index. We show that the symbolic language associated to a register automaton is regular, and we construct, for each regular symbolic language, a register automaton that accepts this language. Our result provides a foundation for grey-box learning algorithms in settings where the constraints on data parameters can be extracted from code using e.g. tools for symbolic/concolic execution or tainting. Moving to a grey-box setting may overcome the scalability problems of state-of-the-art black-box learning algorithms.

Highlights

Model learning (a.k.a. active automata learning) is a black-box technique which constructs state machine models of software and hardware components from information obtained by providing inputs and observing the resulting outputs
The SL∗ algorithm for active learning of register automata of Cassel et al [17] is directly based on a generalization of the classical Myhill-Nerode theorem to a setting of data languages and register automata
We were able to learn models of systems that are completely out of reach of black-box techniques, such as “combination locks”, systems that only exhibit certain behaviors after a very specific sequence of inputs [32]. All these approaches are rather ad hoc, and what is missing is Myhill-Nerode theorem for this enriched settings that may serve as a foundation for grey-box model learning algorithms for a general class of register automata

Summary

Introduction

Model learning (a.k.a. active automata learning) is a black-box technique which constructs state machine models of software and hardware components from information obtained by providing inputs and observing the resulting outputs. We were able to learn models of systems that are completely out of reach of black-box techniques, such as “combination locks”, systems that only exhibit certain behaviors after a very specific sequence of inputs [32] All these approaches are rather ad hoc, and what is missing is Myhill-Nerode theorem for this enriched settings that may serve as a foundation for grey-box model learning algorithms for a general class of register automata. The locations are equivalence classes of ≡l, the transitions are equivalence classes of ≡t, and the registers are equivalence classes of ≡r In this way, we obtain a natural generalization of the classical Myhill-Nerode theorem for symbolic languages and register automata. Our result paves the way for efficient grey-box learning algorithms in settings where the constraints on data parameters can be extracted from the code

Functions

A Myhill-Nerode Theorem

Concluding Remarks