Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks

Ansel Maclaughlin,Anoop Kumar,Ragav Venkatesan,Rahul Gupta,Sriram Venkatapathy,Jwala Dhamala

doi:10.18653/v1/2020.insights-1.4

Abstract

Neural Architecture Search (NAS) methods, which automatically learn entire neural model or individual neural cell architectures, have recently achieved competitive or state-of-the-art (SOTA) performance on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. In this work, we explore the applicability of a SOTA NAS algorithm, Efficient Neural Architecture Search (ENAS) (Pham et al., 2018) to two sentence pair tasks, paraphrase detection and semantic textual similarity. We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. We explore the effectiveness of ENAS through experiments on three datasets (MRPC, SICK, STS-B), with two different models (ESIM, BiLSTM-Max), and two sets of embeddings (Glove, BERT). In contrast to prior work applying ENAS to NLP tasks, our results are mixed – we find that ENAS architectures sometimes, but not always, outperform LSTMs and perform similarly to random architecture search.

Highlights

Neural Architecture Search (NAS) methods aim to automatically discover neural architectures that perform well on a given task and dataset
Through our evaluations on paraphrase detection (PD) and semantic textual similarity (STS), we aim to study whether the Efficient Neural Architecture Search (ENAS) methods used in prior work for natural language inference (NLI) are generalizable and whether the results hold when applied to related tasks and datasets
ESIM models with ENAS-RNNs in both layers lag behind LSTMs by 0.9%, on average

Summary

Introduction

Neural Architecture Search (NAS) methods aim to automatically discover neural architectures that perform well on a given task and dataset. These methods search over a space of possible model architectures, looking for ones that perform well on the task and will generalize to unseen data. We conduct a large set of experiments testing the effectiveness of ENAS-optimized RNN architectures across multiple models (ESIM, BiLSTM-Max), embeddings (BERT, Glove) and datasets (MRPC, SICK, STS-B). To our knowledge, to apply ENAS to PD and STS, to explore applications across multiple embeddings and traditionally LSTM-based NLP models, and to conduct extensive SOTA HPT across multiple ENAS-RNN architecture candidates

Objectives

Methods

Conclusion