Pre-training large language models based on Transformer architecture for building industry application: A review
Pre-training large language models based on Transformer architecture for building industry application: A review
- Book Chapter
12
- 10.1007/978-981-19-4453-6_2
- Jan 1, 2022
The remarkable progress in Natural Language Processing (NLP) brought about by deep learning, particularly with the recent advent of large pre-trained neural language models, is brought into scrutiny as several studies began to discuss and report potential biases in NLP applications. Bias in NLP is found to originate from latent historical biases encoded by humans into textual data which gets perpetuated or even amplified by NLP algorithm. We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur in these models, and various ways in which these biases could be quantified and mitigated. Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias, in large pre-trained language models. We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models. We believe that our attempt to draw a comprehensive view of bias in pre-trained language models, and especially the exploration of affective bias will be highly beneficial to researchers interested in this evolving field.
- Conference Article
2
- 10.1109/icmlant53170.2021.9690554
- Dec 16, 2021
We propose a fine-tuning methodology and a comprehensive comparison between state-of-the-art pre-trained language models (PLM) when applying to Vietnamese Sentiment Analysis. The fine-tuning architecture includes three main components: (1) pre-processing, (2) a pre- trained language model, and (3) a multi-layer perceptron. The method exploits pre-trained contextual language models in order to represent input sentences. Pre-trained contextual language models are belong to three different kinds: multilingual, cross-lingual and monolingual. We conduct experiments to evaluate trained classifiers fine-tuned using five different contextual language models. The experimental results on two open-access datasets show that the sentiment classifiers trained using the monolingual language model outperform of which cross-lingual and monolingual language models. The results provide an additional evidence about the representation power of monolingual PhoBERT in comparison with multilingual BERT and cross-lingual XLM.
- Book Chapter
3
- 10.1007/978-981-19-7960-6_10
- Jan 1, 2022
The existing multi-language generative model is considered as an important part of the multilingual field, which has received extensive attention in recent years. However, due to the scarcity of Chinese Minority corpus, developing a well-designed translation system is still a great challenge. To leverage the current corpus better, we design a pre-training method for the low resource domain, which can help the model better understand low resource text. The motivation is that the Chinese Minority languages have the characteristics of similarity and the adjacency of cultural transmission, and different multilingual translation pairs can provide the pre-trained model with sufficient semantic information. Therefore, we propose the Chinese Minority Pre-Trained (CMPT) language model with multi-tasking and multi-stage strategies to further leverage these low-resource corpora. Specifically, four pre-training tasks and two-stage strategies are adopted during pre-training for better results. Experiments show that our model outperforms the baseline method in Chinese Minority language translation. At the same time, we released the first generative pre-trained language model for the Chinese Minority to support the development of relevant research (All the experimental codes and the pre-trained language model are open-sourced on the website https://github.com/WENGSYX/CMPT).KeywordsMulti-taskMulti-stageChinese minorityGenerative pre-trained language model
- Conference Article
- 10.1145/3447548.3470810
- Aug 14, 2021
Recent years have witnessed the enormous success of text representation learning in a wide range of text mining tasks. Earlier word embedding learning approaches represent words as fixed low-dimensional vectors to capture their semantics. The word embeddings so learned are used as the input features of task-specific models. Recently, pre-trained language models (PLMs), which learn universal language representations via pre-training Transformer-based neural models on large-scale text corpora, have revolutionized the natural language processing (NLP) field. Such pre-trained representations encode generic linguistic features that can be transferred to almost any text-related applications. PLMs outperform previous task-specific models in many applications as they only need to be fine-tuned on the target corpus instead of being trained from scratch.
- Research Article
- 10.62051/ijcsit.v3n2.34
- Jul 19, 2024
- International Journal of Computer Science and Information Technology
This paper systematically reviews the aspect-based sentiment analysis techniques that integrate data augmentation and pre-trained language models. Aspect-based sentiment analysis aims to identify the sentiment tendency of specific aspects in texts. Traditional methods face challenges such as data sparsity and insufficient model generalization. Data augmentation and pre-trained language models bring opportunities to solve these problems. Data augmentation can alleviate data sparsity, and pre-trained language models have powerful feature extraction and transfer learning capabilities. This paper elaborates on the task definition of aspect-based sentiment analysis, focusing on specific methods based on data augmentation and pre-trained language models, including data augmentation strategies and methods, as well as methods based on pre-trained language models such as BERT, RoBERTa, BART, and XLNet, and explores how to combine data augmentation and pre-trained models to improve the performance of aspect-level sentiment analysis. Finally, it is pointed out that there are still some challenges and opportunities in this field, such as the diversity of data augmentation techniques, optimization of pre-trained models, multimodal sentiment analysis, interpretability, and credibility, which need to be further explored.
- Research Article
1
- 10.1016/j.nlp.2024.100062
- Mar 5, 2024
- Natural Language Processing Journal
Understanding latent affective bias in large pre-trained neural language models
- Research Article
6
- 10.1016/j.jbi.2023.104486
- Sep 16, 2023
- Journal of Biomedical Informatics
A self-supervised language model selection strategy for biomedical question answering
- Conference Article
2
- 10.1109/ccai55564.2022.9807755
- May 6, 2022
Korean is the native and official language spoken by Chinese-Korean people, and Weibo is a social media software with a huge number of users in China. Currently, there is few studies related to sentiment analysis of Korean-language Weibo texts posted by Chinese-Korean users. In this paper, we propose a sentiment classification method for Chinese-Korean Weibo based on pre-trained language model and transfer learning. Firstly, we crawled the Chinese-Korean Weibo data from Sina Weibo and label them with sentiment to get the Chinese-Korean Weibo sentiment analysis (CKWSA) dataset. Secondly, to solve the problem of few training samples of the Chinese-Korean Weibo sentiment analysis dataset, we fine-tune the classifier based on the pre-trained Korean language model on the Korean Twitter sentiment analysis dataset to obtain the Korean Twitter sentiment classification model; and further fine-tune the model on CKWSA dataset to get Chinese-Korean Weibo sentiment classification model. The experiments show that the proposed classification method based on pre-trained language model and transfer learning has great performance, and there is an improvement compared other baselines on the Chinese-Korean Weibo sentiment analysis dataset.
- Conference Article
6
- 10.1109/slt54892.2023.10022399
- Jan 9, 2023
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of un-paired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> Our code and models will be publicly available as part of the ESPnet-SLU toolkit.
- Research Article
7
- 10.1016/j.procs.2023.08.147
- Jan 1, 2023
- Procedia Computer Science
Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI
- Research Article
- 10.1142/s2196888823500173
- Dec 7, 2023
- Vietnam Journal of Computer Science
Sentiment Analysis (SA) has attracted increasing research attention in recent years. Most existing works tackle the SA task by fine-tuning single pre-trained language models combined with specific layers. Despite their effectiveness, the previous studies overlooked the utilization of feature representations from various contextual language models. Ensemble learning techniques have garnered increasing attention within the field of Natural Language Processing (NLP). However, there is still room for improvement in ensemble models for the SA task, particularly in the aspect-level SA task. Furthermore, the utilization of heterogeneous ensembles, which combine various pre-trained transformer-based language models, may prove beneficial in enhancing overall performance by incorporating diverse linguistic representations. This paper introduces two ensemble models that leverage soft voting and feature fusion techniques by combining individual pre-trained transformer-based language models for the SA task. The latest transformer-based models, including PhoBERT, XLM, XLM-Align, InfoXLM, and viBERT_FPT, are employed to integrate knowledge and representations using feature fusion and a soft voting strategy. We conducted extensive experiments on various Vietnamese benchmark datasets, encompassing sentence-level, document-level, and aspect-level SA. The experimental results demonstrate that our approaches outperform most existing methods, achieving new state-of-the-art results with F1-weighted scores of 94.03%, 95.65%, 75.36%, and 76.23% on the UIT_VSFC, Aivivn, UIT_ABSA for the restaurant domain, and UIT_ViSFD datasets, respectively.
- Conference Article
36
- 10.1145/3442442.3451375
- Apr 19, 2021
Neural networks for language modeling have been proven effective on several sub-tasks of natural language processing. Training deep language models, however, is time-consuming and computationally intensive. Pre-trained language models such as BERT are thus appealing since (1) they yielded state-of-the-art performance, and (2) they offload practitioners from the burden of preparing the adequate resources (time, hardware, and data) to train models. Nevertheless, because pre-trained models are generic, they may underperform on specific domains. In this study, we investigate the case of multi-class text classification, a task that is relatively less studied in the literature evaluating pre-trained language models. Our work is further placed under the industrial settings of the financial domain. We thus leverage generic benchmark datasets from the literature and two proprietary datasets from our partners in the financial technological industry. After highlighting a challenge for generic pre-trained models (BERT, DistilBERT, RoBERTa, XLNet, XLM) to classify a portion of the financial document dataset, we investigate the intuition that a specialized pre-trained model for financial documents, such as FinBERT, should be leveraged. Nevertheless, our experiments show that the FinBERT model, even with an adapted vocabulary, does not lead to improvements compared to the generic BERT models.
- Conference Article
51
- 10.18653/v1/2022.acl-long.72
- Jan 1, 2022
Human-like biases and undesired social stereotypes exist in large pretrained language models. Given the wide adoption of these models in real-world applications, mitigating such biases has become an emerging and important task. In this paper, we propose an automatic method to mitigate the biases in pretrained language models. Different from previous debiasing work that uses external corpora to fine-tune the pretrained models, we instead directly probe the biases encoded in pretrained models through prompts. Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups. Given the identified biased prompts, we then propose a distribution alignment loss to mitigate the biases. Experiment results on standard datasets and metrics show that our proposed Auto-Debias approach can significantly reduce biases, including gender and racial bias, in pretrained language models such as BERT, RoBERTa and ALBERT. Moreover, the improvement in fairness does not decrease the language models' understanding abilities, as shown using the GLUE benchmark.
- Video Transcripts
- 10.48448/c5m9-6p30
- May 11, 2022
Human-like biases and undesired social stereotypes exist in large pretrained language models. Given the wide adoption of these models in real-world applications, mitigating such biases has become an emerging and important task. In this paper, we propose an automatic method to mitigate the biases in pretrained language models. Different from previous debiasing work that uses external corpora to fine-tune the pretrained models, we instead directly probe the biases encoded in pretrained models through prompts. Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups. Given the identified biased prompts, we then propose a distribution alignment loss to mitigate the biases. Experiment results on standard datasets and metrics show that our proposed Auto-Debias approach can significantly reduce biases, including gender and racial bias, in pretrained language models such as BERT, RoBERTa and ALBERT. Moreover, the improvement in fairness does not decrease the language models' understanding abilities, as shown using the GLUE benchmark.
- Research Article
2
- 10.5282/ubm/epub.75928
- Jun 25, 2020
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Mainly three forces are driving the improvements in this area of research: More elaborated architectures are making better use of contextual information. Instead of simply plugging in static pre-trained representations, these are learned based on surrounding context in end-to-end trainable models with more intelligently designed language modelling objectives. Along with this, larger corpora are used as resources for pre-training large language models in a self-supervised fashion which are afterwards fine-tuned on supervised tasks. Advances in parallel computing as well as in cloud computing, made it possible to train these models with growing capacities in the same or even in shorter time than previously established models. These three developments agglomerate in new state-of-the-art (SOTA) results being revealed in a higher and higher frequency. It is not always obvious where these improvements originate from, as it is not possible to completely disentangle the contributions of the three driving forces. We set ourselves to providing a clear and concise overview on several large pre-trained language models, which achieved SOTA results in the last two years, with respect to their use of new architectures and resources. We want to clarify for the reader where the differences between the models are and we furthermore attempt to gain some insight into the single contributions of lexical/computational improvements as well as of architectural changes. We explicitly do not intend to quantify these contributions, but rather see our work as an overview in order to identify potential starting points for benchmark comparisons. Furthermore, we tentatively want to point at potential possibilities for improvement in the field of open-sourcing and reproducible research.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.