Abstract

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.