Abstract

The easiness of reaching information through the internet and social media and the expansiveness of opportunities for searching, copying, and spreading data have caused some problems in identifying an author for a specific text. A text carries the characteristic features of the person who wrote it, and these features can be used to identify its author. For this study, we are offering a method that is based on an approach using ensemble learning algorithm (ELA) and genetic algorithm (GA) for author identification in Tur-kish texts. The raw data set, which includes 40 authors and 3269 texts, was created from Turkish news websites and analyzed in pre-processing step. After, syntactic and structural analyses were done on the data and, in total, 6 different data sets were created. Each of the data sets was subjected to the feature selection process by using GA and ELA approach together. Each of the obtained data sets from the previous step was classified by using the ELA's bagging method which contains 5 different classifiers, namely, Naive Bayes, K-Nearest Neighbor, Artificial Neural Networks, Support Vector Machine, and Decision Tree. After applying the aforementioned processes to the raw data, the author identification approach reached 89% accuracy. The combination of ELA and GA has a strong potential to identify the author of a text.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.