Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss

Laura Jehl,Stefan Riezler,Carolin Lawrence

doi:10.1162/tacl_a_00265

Laura Jehl, Stefan Riezler + Show 1 more

Open Access

https://doi.org/10.1162/tacl_a_00265

Copy DOI

Abstract

In many machine learning scenarios, supervision by gold labels is not available and conse quently neural models cannot be trained directly by maximum likelihood estimation. In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.

Highlights

Sequence-to-sequence neural models are standardly trained using a maximum likelihood estimation (MLE) objective
In our first task of semantic parsing, question-answer pairs provide a weak supervision signal to find parses that execute to the correct answer
We show that ramp loss can outperform minimum risk training (MRT) if it incorporates bipolar supervision where parses that receive negative feedback are actively discouraged

Summary

Introduction

Sequence-to-sequence neural models are standardly trained using a maximum likelihood estimation (MLE) objective. MLE training requires full supervision by gold target structures, which in many scenarios are too difficult or expensive to obtain. There are many domains for which no gold references exist, crosslingual document-level links are present for many multilingual data collections. In this paper we investigate methods where a supervision signal for output structures can be extracted from weak feedback. We use learning from weak feedback, or weakly supervised learning, to refer to a scenario where output structures generated by the model are judged according to an external metric, and this feedback is used to extract a supervision signal that guides the learning process. Metric-augmented sequence-level objectives from reinforcement learning (Williams, 1992; Ranzato et al, 2016), minimum risk training (MRT) (Smith and Eisner, 2006; Shen et al, 2016) or margin-based structured prediction objectives (Taskar et al, 2005; Edunov et al, 2018) can be seen as instances of such algorithms

Methods

Results

Conclusion