Semantic programming by example with pre-trained models

Gust Verbruggen,Vu Le,Sumit Gulwani

doi:10.1145/3485477

Gust Verbruggen, Vu Le + Show 1 more

Open Access

https://doi.org/10.1145/3485477

Copy DOI

Abstract

The ability to learn programs from few examples is a powerful technology with disruptive applications in many domains, as it allows users to automate repetitive tasks in an intuitive way. Existing frameworks on inductive synthesis only perform syntactic manipulations, where they rely on the syntactic structure of the given examples and not their meaning. Any semantic manipulations, such as transforming dates, have to be manually encoded by the designer of the inductive programming framework. Recent advances in large language models have shown these models to be very adept at performing semantic transformations of its input by simply providing a few examples of the task at hand. When it comes to syntactic transformations, however, these models are limited in their expressive power. In this paper, we propose a novel framework for integrating inductive synthesis with few-shot learning language models to combine the strength of these two popular technologies. In particular, the inductive synthesis is tasked with breaking down the problem in smaller subproblems, among which those that cannot be solved syntactically are passed to the language model. We formalize three semantic operators that can be integrated with inductive synthesizers. To minimize invoking expensive semantic operators during learning, we introduce a novel deferred query execution algorithm that considers the operators to be oracles during learning. We evaluate our approach in the domain of string transformations: the combination methodology can automate tasks that cannot be handled using either technologies by themselves. Finally, we demonstrate the generality of our approach via a case study in the domain of string profiling.

Highlights

Teaching a machine to write programs that satisfy a given specification is widely regarded as one of the fundamental problems in artificial intelligence
We propose a novel framework of integrating pre-trained language models with inductive synthesis by augmenting the language over which programs are synthesized with semantic operators that are powered by the language model
This paper introduces a novel integration two popular technologies: inductive program synthesis and autoregressive language models with few-shot learning capabilities

Summary

Introduction

Teaching a machine to write programs that satisfy a given specification is widely regarded as one of the fundamental problems in artificial intelligence. The task of inductive synthesis or programming by example, where the specification is given by (partial) examples of the desired output on given input, allows for the automation of repetitive tasks in a variety of domains. We introduce the FlashMeta framework, as well as the specific flavour of neural network that we can use to solve those semantic problems that cannot be further decomposed using FlashMeta. All synthesizers that we described in the previous section are instantiations of the FlashMeta framework. They are implemented using the publicly available implementation of this framework called PROSE. Given a specification of a program, typically as a set of input-output examples, the PROSE framework provides synthesis strategies to search for a program over this DSL that satisfies the given specification

Methods

Results

Conclusion