Pretrained domain-specific language model for natural language processing tasks in the AEC domain

Zhe Zheng,Xin-Zheng Lu,Ke-Yin Chen,Yu-Cheng Zhou,Jia-Rui Lin

doi:10.1016/j.compind.2022.103733

Abstract

As an essential task for the architecture, engineering, and construction (AEC) industry, information processing and acquiring from unstructured textual data based on natural language processing (NLP) are gaining increasing attention. Although deep learning (DL) models for NLP tasks have been investigated for years, domain-specific pretrained DL models and their advantages are seldomly investigated in the AEC domain. Therefore, this work developed a large scale domain corpora and pretrained domain-specific language models for the AEC domain, and then systematically explores various transfer learning and fine-tuning techniques to explore the performance of pretrained DL models for various NLP tasks. First, both in-domain and close-domain Chinese corpora are developed. Then, two types of pretrained models, including static word embedding models and contextual word embedding models, are pretrained based on various domain corpora. Finally, several widely used DL models for NLP tasks are further trained and tested based on various pretrained models. The result shows that domain corpora can further improve the performance of static word embedding-based DL models and contextual word embedding-based DL models in text classification (TC) and named entity recognition (NER) tasks. Meanwhile, contextual word embedding-based DL models significantly outperform the static word embedding-based DL methods in TC and NER tasks, with maximum improvements of 8.1% and 3.8% in the F1 score, respectively. This research contributes to the body of knowledge in two ways: (1) demonstrating the advantages of domain corpora and pretrained DL models, and (2) opening the first domain-specific dataset and pretrained language models named ARCBERT for the AEC domain. Thus, this work sheds light on the adoption and application of pretrained models in the AEC domain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pretrained domain-specific language model for natural language processing tasks in the AEC domain

Abstract

Talk to us

Similar Papers

More From: Computers in Industry

Lead the way for us

Journal: Computers in Industry	Publication Date: Jun 21, 2022
Citations: 38

Similar Papers

MobileDLSearch: Ontology-based Mobile Platform for Effective Sharing and Reuse of Deep Learning Models
Zhangcheng Qiang ... Abdur Rahim Mohammad Forkan
-
Zhangcheng Qiang, et. al.Zhangcheng Qiang ... Abdur Rahim Mohammad Forkan
01 Dec 2021
01 Dec 2021

Next word prediction for Urdu language using deep learning models
Ramish Shahid ... Maryam Bashir
Computer Speech & Language | VOL. 87
Ramish Shahid, et. al.Ramish Shahid ... Maryam Bashir
02 Mar 2024
Computer Speech & Language | VOL. 87

HinPLMs: Pre-trained Language Models for Hindi
Xixuan Huang ... Suifu Gan
-
Xixuan Huang, et. al.Xixuan Huang ... Suifu Gan
11 Dec 2021
11 Dec 2021

Increasing trust in complex machine learning systems
Jaehun Kim
ACM SIGIR Forum | VOL. 55
Jaehun KimJaehun Kim
01 Jun 2021
ACM SIGIR Forum | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pretrained domain-specific language model for natural language processing tasks in the AEC domain

Abstract

Talk to us

Similar Papers

More From: Computers in Industry