Abstract
Due to the ubiquitous nature and anonymity abuses in cyberspace, it’s difficult to make criminal identity tracing in cybercrime investigation. Writeprint identification offers a valuable tool to counter anonymity by applying stylometric analysis technique to help identify individuals based on textual traces. In this study, a framework for online writeprint identification is proposed. Variable length character n-gram is used to represent the author’s writing style. The technique of IG seeded GA based feature selection for Ensemble (IGAE) is also developed to build an identification model based on individual author level features. Several specific components for dealing with the individual feature set are integrated to improve the performance. The proposed feature and technique are evaluated on a real world data set encompassing reviews posted by 50 Amazon customers. The experimental results show the effectiveness of the proposed framework, with accuracy over 94% for 20 authors and over 80% for 50 ones. Compared with the baseline technique (Support Vector Machine), a higher performance is achieved by using IGAE, resulting in a 2% and 8% improvement over SVM for 20 and 50 authors respectively. Moreover, it has been shown that IGAE is more scalable in terms of the number of authors, than author group level based methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.