BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Xingrun Xing,Li Du,Xianlin Zeng,Jiajun Zhang,Xinyuan Wang,Yequan Wang,Zheng Zhang

doi:10.1609/aaai.v38i14.29542

Abstract

Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Binary Pretrained Foundation Transformer (BiPFT) for natural language understanding (NLU) tasks, which remarkably saves 56 times operations and 28 times memory. In contrast to previous task-specific binary transformers, BiPFT exhibits a substantial enhancement in the learning capabilities of binary neural networks (BNNs), promoting BNNs into the era of pre-training. Benefiting from extensive pretraining data, we further propose a data-driven binarization method. Specifically, we first analyze the binarization error in self-attention operations and derive the polynomials of binarization error. To simulate full-precision self-attention, we define binarization error as binarization residual polynomials, and then introduce low-rank estimators to model these polynomials. Extensive experiments validate the effectiveness of BiPFTs, surpassing task-specific baseline by 15.4% average performance on the GLUE benchmark. BiPFT also demonstrates improved robustness to hyperparameter changes, improved optimization efficiency, and reduced reliance on downstream distillation, which consequently generalize on various NLU tasks and simplify the downstream pipeline of BNNs. Our code and pretrained models are publicly available at https://github.com/Xingrun-Xing/BiPFT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models
Eunchan Lee ... Sangtae Ahn
Applied Sciences | VOL. 12
Eunchan Lee, et. al.Eunchan Lee ... Sangtae Ahn
29 Apr 2022
Applied Sciences | VOL. 12

SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu ... Chao Lou
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Tianyu Yu, et. al.Tianyu Yu ... Chao Lou
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

KLEJ: Comprehensive Benchmark for Polish Language Understanding
Piotr Rybak ... Ireneusz Gawlik
-
Piotr Rybak, et. al.Piotr Rybak ... Ireneusz Gawlik
01 Jan 2020
01 Jan 2020

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
...
-
, et. al. ...
11 May 2022
11 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence