Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack Learning

Jingrui Hou,Ping Wang,Jiangnan Li

doi:10.1002/pra2.449

Abstract

AbstractOnline open‐source knowledge repository such as Wikipedia has become an increasingly important source for users to access knowledge. However, due to its large volume, it is challenging to evaluate Wikipedia article quality manually. To fill this gap, we propose a novel approach named “feature fusion‐based stack learning” to assess the quality of Wikipedia articles. Pre‐trained language models including BERT (Bidirectional Encoder Representations from Transformers) and ELMo (Embeddings from Language Models) are applied to extract semantic information in Wikipedia content. The feature fusion framework consisting of semantic and statistical features is built and fed into an out‐of‐sample (OOS) stacking model, which includes both machine learning and deep learning models. We compare the performance of proposed model with some existing models with different metrics extensively, and conduct ablation studies to prove the effectiveness of our framework and OOS stacking. Generally, the experiment shows that our method is much better than state‐of‐the‐art models.

Full Text