A Literature Review on Bidirectional Encoder Representations from Transformers

S Shreyashree,Pramod Sunagar,S Rajarajeswari,Anita Kanavalli

doi:10.1007/978-981-16-6723-7_23

Abstract

AbstractTransfer learning is a technique of training a model for a specific problem and using it as a base for training another related problem. It has been proved to be very effective and has two phases: the pre-training phase (generation of pre-trained models) and the adaptive phase (reuse of pre-trained models). Auto-encoding pre-trained learning model is one type of pre-trained model, which uses the transformer model’s encoder component to perform natural language understanding. This work discusses the bidirectional encoder representations from transformers (BERT) and its variants and relative performances. BERTs are transformer-based models developed for pre-training unlabeled texts, bidirectional, by considering the semantics of texts from both sides of the word being processed. The model implements the above function using two specific functions: masked language modeling (MLM) and next sequence prediction (NSP). The robustly optimized BERT (RoBERTa) variant of BERT with few modifications has significant improvements in removing NSP loss function due to its inefficiency. SpanBERT is another variant that modifies MLM tasks by masking contagious random spans and also uses the span-boundary objective (SBO) loss function. A lite BERT (ALBERT) is another variant with two-parameter reduction techniques: factorized embedding parameterization and cross-layer parameter sharing. It also uses inter-sentence coherence loss instead of NSP. The performance of the BERT’s variants is found to be better than BERT, with few modifications as per the available literature.KeywordsBidirectional encoder representations from transformers (BERT)Robustly optimized BERT (RoBERTa)A lite BERT (ALBERT)Span-boundary objective (SBO)Masked language modeling (MLM)Next sequence prediction (NSP)Sequence-to-sequence (Seq2Seq)

Full Text