Abstract

The quality of Wikipedia articles is still the main concerned in all languages. Wikipedia relies mostly on human editors and administrators to provide the quality of content. But the magnitude of Wikipedia content makes locating all instances of article very time consuming. Therefore, we need the automatic quality detection that can help users to evaluate the quality of articles. In this paper, we propose the feature set to applied for the ASEAN language Wikipedia articles. We investigate the statistical features such as # of link, # of infobox, length of article, # of headings, # of files, # of contributors, # of viewer, # of written articles found in other languages, and # of templates applied in the article. The experiments are perform using Naive Bayes and Decision tree algorithm. We found that the accuracy of Decision tree (96.34%) outperform Naive Bayes (86.47%). Moreover, we found that the statistical features play an important role in quality classification of Vietnamese, Indonesian, Malaysian, Thai, and Tagalog/Philippines Wikipedia articles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call