Abstract

Illegal web information is common on the Internet. To prevent phenomena of illegal web information from happening, providing effective evidence for court to punish the criminals by means of law is one effective method. In this paper, an authorship attribution platform for Chinese web information, CWAAP, is described. Based on the language characteristics of Chinese web information, lexical features and structural features which can express the author’s writing habit are extracted. Support vector machines (SVM) are used for learning author’s writing features. To test the effectiveness of CWAAP, literature, Blog and BBS datasets are used in the experiments on the platform. Five experiments are performed. Experimental results show that lexical features and structural features are effective. The number of words in training samples should exceed 200 at least. By Information Gain feature selection methods, 800 lexical features can express the authors’ writing style. There is a small difference between the authors’ topics. All the parts of speech reserved are perfect. These results confirm that the platform is effective and feasible for cybercrime forensic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call