Pre-Training With Whole Word Masking for Chinese BERT

Yiming Cui,Ting Liu,Wanxiang Che,Bing Qin,Ziqing Yang

doi:10.1109/taslp.2021.3124365

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pre-Training With Whole Word Masking for Chinese BERT

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 704

Similar Papers

Revisiting Pre-Trained Models for Chinese Natural Language Processing
Yiming Cui ... Ting Liu
-
Yiming Cui, et. al.Yiming Cui ... Ting Liu
01 Jan 2020
01 Jan 2020

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method
Fenfang Li ... Han Deng
Applied Sciences | VOL. 14
Fenfang Li, et. al.Fenfang Li ... Han Deng
02 Apr 2024
Applied Sciences | VOL. 14

Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
Shifeng Chen ... Ketai He
Information | VOL. 15
Shifeng Chen, et. al.Shifeng Chen ... Ketai He
06 Feb 2024
Information | VOL. 15

Multi-Encoder Transformer for Korean Abstractive Text Summarization
Youhyun Shin
IEEE Access | VOL. 11
Youhyun ShinYouhyun Shin
01 Jan 2023
IEEE Access | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-Training With Whole Word Masking for Chinese BERT

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing