Abstract

As a new model of distributed, collaborative information source, such as Wikipedia, is emerging, its content is constantly being generated, updated and maintained by various users and its data quality varies from time to time. Thus the quality assessment of the content is a pressing concern now. We observe that each article usually goes through a series of editing phases such as building structure, contributing text, discussing text, etc., gradually getting into the final quality state and that the articles of different quality classes exhibit specific edit cycle patterns. We propose a new approach to Assess Quality based on article's Editing History (AQEH) for a specific domain as follows. First, each article's editing history is transformed into a state sequence borrowing Hidden Markov Model(HMM). Second, edit cycle patterns are first extracted for each quality class and then each quality class is further refined into quality corpora by clustering. Now, each quality class is clearly represented by a series of quality corpora and each quality corpus is described by a group of frequently co-occurring edit cycle patterns. Finally, article quality can be determined in probabilistic sense by comparing the article with the quality corpora. Experimental results demonstrate that our method can capture and predict web article's quality accurately and objectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call