Well-written articles shape readers interaction with information, as top-ranked articles are more likely to be seen than those further down the ranks. We present a new approach to classifying Wikipedia articles across various quality dimensions, harnessing knowledge gained from expert assessments. The study also includes an attempt to develop a solid framework that meets the evaluation of the quality of an article for the sole purpose to ensure the integrity of the content in the multipoint structure of the Internet, and to have input for the applications of Digital Forensics. The suggested method: the article details is gathered using the Wikipedia API and a set of metrics is well-defined to store and analyze this information. The methodology then explores the relationship between independent variables (metrics of the articles) and the dependent variable (quality level as rated by the experts). Three machine learning algorithms (RF, J48, and NB) are then used to classify the articles. The classification is dragged along with the expert reviews to determine whether quality level of Wikipedia articles. The empirical evidence illustrates the effectiveness of the proposed approach, with average accuracies greater than 70% for the J48 algorithm. The precision, recall and F-measure values corresponding to the classification models’ accuracy exceed 0.7, representing a strong performance model. Overall, these findings indicate that the method uses reliable criteria, which classifies Wikipedia articles in accordance with experts' opinions, making it a reliable tool for quality assessment. In addition, the study underscores the significance of the combined focus on precision and recall for assessing the quality of a model, thereby demonstrating how useful this method is in ensuring that content can be trusted and as part of digital forensics.
Read full abstract