Effective deep learning through bidirectional reading on masked language model

Hiroyuki Nishimoto

doi:10.54941/ahfe1001177

Abstract

Google BERT is a neural network that is good at natural language processing. It has two major strategies. One is “Masked Language Model” to clear the word-level relationships, and the other is “Next Sentence Prediction” to clear sentence-level relationships. In the Masked Language Model, with the task of masking some words in sentences, BERT learns to predict the original word from context. Some questions come to mind. Why BERT achieves effective learning by reading in two ways from fore and back? What is the difference between the bidirectional reading? In the Masked Language Model with the task of masking some words, the middle sentence is “I ate [mask] every morning”. First, predict the masked word by forward reading. The previous sentence is “Considering my health, I decided to change the breakfast menu”. What is asked in general? We usually think about which is more realistic and reach “an apple”. The answer is “feasibility”.Next, predict the masked word by backward reading, focusing on the middle sentence and the post sentence “A month later, I lost 3 kg and became healthy.” What is asked in general? In such a situation, we are looking for success factors. We usually think about which is more relevant and reach “an apple”. The answer is causality. Therefore, BERT is learning to predict feasibility by forward reading and causality by backward reading.Besides, the bidirectional reading technique can be applied to scenario planning using back-casting from the future. Scenario planning is making assumptions on what the future is going to be. A scenario can be described in two ways, one is fore-casting and the other is back-casting. Fore-casting means viewing from the present to the future. In general, back-casting means viewing from the present to the past. But in this paper, it means viewing from the future to the present. Just as there are two different predictions for bidirectional reading, there is a big difference between fore-casting into the future and back-casting from the future. How do you feel about the first scenario using fore-casting? You tend to focus on feasibility. Therefore, a long debate starts about the feasibility, such as “Is it possible?”, “Is it difficult?”, and “How to achieve?”.On the other hand, how do you feel about the second scenario using back-casting from the future to the present? This scenario has to be written in past mode because of back-casting. If it is written in past mode, you feel it has done and someone has resolved all the problems by that time. The surrounding words in past mode change your feeling from prediction to event context. You tend to focus on causal factors of success. You can escape from the long debate.Scenario planning using back-casting from the future to the present makes a good proposition, except for long discussions. Besides, in terms of the mystery of deep learning, each answer lies in human thinking mechanism because AI is created by imitating the human brain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effective deep learning through bidirectional reading on masked language model

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Conditional BERT Contextual Augmentation
Xing Wu ... Shangwen Lv
-
Xing Wu, et. al.Xing Wu ... Shangwen Lv
01 Jan 2019
01 Jan 2019

Self-Supervised Contextual Data Augmentation for Natural Language Processing
Dongju Park ... Chang Wook Ahn
Symmetry | VOL. 11
Dongju Park, et. al.Dongju Park ... Chang Wook Ahn
11 Nov 2019
Symmetry | VOL. 11

Cultural Understanding Using In-context Learning and Masked Language Modeling
Ming Qian ... Davis Qian
-
Ming Qian, et. al.Ming Qian ... Davis Qian
01 Jan 2020
01 Jan 2020

Investigating Transformers for Automatic Short Answer Grading
Leon Camus ... Anna Filighera
-
Leon Camus, et. al.Leon Camus ... Anna Filighera
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective deep learning through bidirectional reading on masked language model

Abstract

Talk to us

Similar Papers