Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks

Tingyu Xia,Yuan Tian,Yue Wang,Yi Chang

doi:10.1145/3442381.3449988

Abstract

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Jia Li ... Wenjuan Liu
BMC Medical Informatics and Decision Making | VOL. 22
Jia Li, et. al.Jia Li ... Wenjuan Liu
30 Jul 2022
BMC Medical Informatics and Decision Making | VOL. 22

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT
Shoya Wada ... Yasushi Matsumura
Artificial Intelligence In Medicine | VOL. 153
Shoya Wada, et. al.Shoya Wada ... Yasushi Matsumura
05 May 2024
Artificial Intelligence In Medicine | VOL. 153

Natural Language Processing with “More Than Words – BERT”
Saranlita Chotirat ... Phayung Meesad
-
Saranlita Chotirat, et. al.Saranlita Chotirat ... Phayung Meesad
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks

Abstract

Talk to us

Similar Papers