A Collaborative AI-Enabled Pretrained Language Model for AIoT Domain Question Answering

Hongyin Zhu,Ahmed Ghoneim,M Shamim Hossain,Prayag Tiwari

doi:10.1109/tii.2021.3097183

Abstract

Large-scale knowledge in the Artificial Intelligence of Things (AIoT) field urgently needs effective models to understand human language and automatically answer questions. Pre-trained language models (PLMs) achieve state-of-the-art performance on some question answering (QA) datasets, but few models can answer questions on AIoT domain knowledge. Currently, AIoT domain lacks sufficient QA datasets and large-scale pre-training corpora. We propose RoBERTa-AIoT to address the problem of the lack of high-quality large-scale labeled AIoT QA datasets. We construct an AIoT corpus to further pre-train RoBERTa and BERT. RoBERTa-AIoT and BERT-AIoT leverage unsupervised pre-training on a large corpus composed of AIoT-oriented Wikipedia webpages to learn more domain-specific context and improve performance on the AIoT QA tasks. To fine-tune and evaluate the model, we construct 3 AIoT QA datasets based on the community QA websites. We evaluate our approach on these datasets and the experimental results demonstrate the significant improvements of our approach.

Full Text