Improving text mining in plant health domain with GAN and/or pre-trained language model.

Shufan Jiang,Rafael Angarita,Francis Rousseaux,Stéphane Cormier

doi:10.3389/frai.2023.1072329

Abstract

The Bidirectional Encoder Representations from Transformers (BERT) architecture offers a cutting-edge approach to Natural Language Processing. It involves two steps: 1) pre-training a language model to extract contextualized features and 2) fine-tuning for specific downstream tasks. Although pre-trained language models (PLMs) have been successful in various text-mining applications, challenges remain, particularly in areas with limited labeled data such as plant health hazard detection from individuals' observations. To address this challenge, we propose to combine GAN-BERT, a model that extends the fine-tuning process with unlabeled data through a Generative Adversarial Network (GAN), with ChouBERT, a domain-specific PLM. Our results show that GAN-BERT outperforms traditional fine-tuning in multiple text classification tasks. In this paper, we examine the impact of further pre-training on the GAN-BERT model. We experiment with different hyper parameters to determine the best combination of models and fine-tuning parameters. Our findings suggest that the combination of GAN and ChouBERT can enhance the generalizability of the text classifier but may also lead to increased instability during training. Finally, we provide recommendations to mitigate these instabilities.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in artificial intelligence	Publication Date: Feb 21, 2023
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Improving text mining in plant health domain with GAN and/or pre-trained language model.

Abstract

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence

Lead the way for us

Similar Papers

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method
Fenfang Li ... Han Deng
Applied Sciences | VOL. 14
Fenfang Li, et. al.Fenfang Li ... Han Deng
02 Apr 2024
Applied Sciences | VOL. 14

Multi-Encoder Transformer for Korean Abstractive Text Summarization
Youhyun Shin
IEEE Access | VOL. 11
Youhyun ShinYouhyun Shin
01 Jan 2023
IEEE Access | VOL. 11

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
Muzamil Hussain Syed ... Sun-Tae Chung
Applied Sciences | VOL. 11
Muzamil Hussain Syed, et. al.Muzamil Hussain Syed ... Sun-Tae Chung
28 Jun 2021
Applied Sciences | VOL. 11

Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)
Jairus Mingua ... Dionis Padilla
-
Jairus Mingua, et. al.Jairus Mingua ... Dionis Padilla
28 Nov 2021
28 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving text mining in plant health domain with GAN and/or pre-trained language model.

Abstract

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence