Analysis of effective techniques and algorithms in terms of “text mining” to predict the authorship in Albanian language

Miranda Harizaj,Alfons Harizaj,Arli Minga

doi:10.59380/crj.v1i1.2744

Abstract

Natural Language Processing has gained a special importance and development in recent years, where the analysis of written texts through various techniques of “text mining” and the extraction of all their features is a prerequisite to be used and be further implemented for various purposes. In this paper it will be compared some of the most effective techniques and algorithms in termsof “text mining”, to predict the authorship of a written text in the Albanian language, using for training the model, a fund of articles written by some of the most well- known bloggers of Albanian journalism. When talking about finding the authorship of a text it must kept in mind many important elements such as: number of sentences, sentence structure, number of words in a sentence, repetition of the same word, length of words used, frequency of the use of punctuation, literary figures used; elements which best display the unique narrative style for each author. This paper can serve as a good starting point to go further to its specific objective, predicting the authorship of an anonymous text, but also for other applications related to “text mining”, referring to the Albanian language.

Full Text