Abstract
Article quality has always been a major concern for Wikipedia. To improve article quality, it is critical to first identify defects. Thus, flaw classification has attracted considerable attention. To achieve this, several machine-learning-based approaches are available, including deep learning models based on either manually constructed or autoextracted features. However, adopting only features of either single type may not ensure a comprehensive description of articles. To improve flaw classification, we propose a feature fusion framework combining both handcrafted and autoextracted features. In this research, we first use a rule-based method from a previously proposed framework to extract handcrafted features. Additionally, we obtain autoextracted features using Bidirectional Encoder Representations from Transformers (BERT) and various deep learning models, including bidirectional long short-term memory (Bi LSTM), bidirectional gated recurrent unit (Bi GRU), bidirectional recurrent neural network (Bi RNN), and multihead self-attention models. Finally, the handcrafted features are standardized and concatenated with the autoextracted features. Then, the concatenated features are fed into a feedforward neural network for classification. A detailed comparison of different classifiers is conducted. We compare 12 different classifiers in terms of training performance, classification performance, and model training time. The experiments show that the proposed feature fusion framework can notably improve the effectiveness of quality flaw classification for Wikipedia articles. In particular, a Bi GRU model based on the proposed framework achieves excellent classification accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.