Transformers: State-of-the-Art Natural Language Processing

Thomas Wolf,Lysandre Debut,Julien Chaumond,Tim Rault,Pierric Cistac,Anthony Moi,Victor Sanh,Clément Delangue ,Frank Morgan ,Rémi Louf ,Jamie Brew

doi:10.18653/v1/2020.emnlp-demos.6

Abstract

Recent advances in modern Natural Language Processing (NLP) research have been dominated by the combination of Transfer Learning methods with large-scale Transformer language models. With them came a paradigm shift in NLP with the starting point for training a model on a downstream task moving from a blank specific model to a general-purpose pretrained architecture. Still, creating these general-purpose models remains an expensive and time-consuming process restricting the use of these methods to a small sub-set of the wider NLP community. In this paper, we present Transformers, a library for state-of-the-art NLP, making these developments available to the community by gathering state-of-the-art general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream NLP tasks. Transformers features carefully crafted model implementations and high-performance pretrained weights for two main deep learning frameworks, PyTorch and TensorFlow, while supporting all the necessary tools to analyze, evaluate and use these models in downstream tasks such as text/token classification, questions answering and language generation among others. Transformers has gained significant organic traction and adoption among both the researcher and practitioner communities. We are committed at Hugging Face to pursue the efforts to develop Transformers with the ambition of creating the standard library for building NLP systems.

Highlights

The Transformer (Vaswani et al, 2017) has rapidly become the dominant architecture for natural language processing, surpassing alternative neural models such as convolutional and recurrent neural networks in performance for tasks in both natural language understanding and natural language generation
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining
The library consists of carefully engineered stateof-the art Transformer architectures under a unified API

Summary

Introduction

The Transformer (Vaswani et al, 2017) has rapidly become the dominant architecture for natural language processing, surpassing alternative neural models such as convolutional and recurrent neural networks in performance for tasks in both natural language understanding and natural language generation. The Transformer architecture is conducive to pretraining on large text corpora, leading to major gains in accuracy on downstream tasks including text classification (Yang et al, 2019), language understanding (Liu et al, 2019b; Wang et al, 2018, 2019), machine translation (Lample and Conneau, 2019a), coreference resolution (Joshi et al, 2019), commonsense inference (Bosselut et al, 2019), and summarization (Lewis et al, 2019) among others This advance leads to a wide range of practical challenges that must be addressed in order for these models to be widely utilized. The philosophy is to support industrial-strength implementations of popular model variants that are easy to read, extend, and deploy On this foundation, the library supports the distribution and usage of a wide-variety of pretrained models in a centralized model hub. Proceedings of the 2020 EMNLP (Systems Demonstrations), pages 38–45 November 16-20, 2020. c 2020 Association for Computational Linguistics

Related Work

Library Design

Community Model Hub

Deployment

Conclusion