Abstract

Context. Authorization of the authorship of the text is a technique for determining the author of the text, when it is ambiguous who wrote it. It is useful when several people claim to be the authors of one publication or in cases where nobody claims to authorship of text content, for example, so-called trolls in social networks during an information warfare. The complexity of the problem of the author’s text, obviously, is exponentially higher, more likely authors. The presence of author’s text samples is also significant in advancing this problem. The attribution of the author’s text includes the following three problems:– author discovery of text from probable or expected authors group, where the author is always in a suspects group;– not identification of the author of a text author from a group of probable or expected authors, where the author may not be in a group of suspects;– assessment of the possibility of this text, written by the author or not.Therefore, the task of automatically determining the author of text content of scientific and technical direction is relevant and requires new (more perfect) approaches to its solution.Objective of the study is to develop a method for determining the author in Ukrainian texts based on the technology of lingometry. Method. Lingvometric method of algorithmic provision of content monitoring processes for solving the problem of automatic determination of the author of Ukrainian-language text content on the basis of technology of statistical analysis of linguistic diversity coefficients is developed. A decomposition of the method of determination of the author on the basis of analysis of such broadcasting factors as lexical diversity, degree (degree) of syntactic complexity, speech connectivity, singularity indexes and text concentrations is made. Also, author’s style parameters are analyzed as the number of words in a particular text, the total number of words in this text, the number of sentences, the number of prepositions, the number of conjunctions, the number of words with the frequency of 1, and the number of words with a frequency of 10 or more. The features of the developed is the adaptation of the morphological and syntactic analysis of lexical units to the features of the designs of Ukrainian-language words / texts. That is, in the analysis of linguistic units of the type of words, the affiliation with the part of speech and declarations within this part of the language was taken into account. To do this, an analysis of the flexion of these words was carried out for classification, the allocation of the basis for the formation of the corresponding alphabet-frequency dictionaries. The filling of these dictionaries was further taken into account in the subsequent steps of determining the authorship of the text as the calculation of parameters and coefficients of copyright broadcasting. For the individual style of a writer, it is precisely service (stop or reference) words that are indicative because they are not related to the topic and content of the publication.Results. A comparison of results on a plurality of 200 individual technical works of about 100 different authors over the period 2001–2017 has been made to determine whether the coefficients of the diversity of the text of these authors are different at different intervals.Conclusions. It has been found that for the chosen experimental base with over 200 works of the best results, the method of analysis of the article without initial obligatory information as annotations and keywords in various languages and the list of literature achieves the density criterion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.