Abstract

The peculiarities of the application of linguo-statistics technologies for the identification of the style of the author of text content of scientific and technical profile are considered. Quantitative linguistic analysis of a text uses the benefits of content monitoring based on the NLP methods to identify and analyze the set of stop words, keywords, set phrases and to study N-gram. The latter are used in the linguometry methods to determine in per cent if the given text belongs to a particular author. The quantitative method for automatic text content authorship attribution was developed based on statistical analysis of the 3-gram distribution. The approach to the implementation of identification of the author of the text in the Ukrainian language of the scientific and technical profile was proposed. Experimental results of the proposed method to determine the belonging of the analyzed text to a specific author in the presence of the reference text were obtained. Application of the linguo-statistical analysis of the 3-grams to a set of articles will make it possible to form a subset of publications that are similar in linguistic descriptions. Imposing additional conditions in the form of statistical and quantitative analyses (a set of keywords, set expressions, stylometric, linguometric analyses, etc.) on a subset will allow a significant reduction of this subset by specifying the list of the most likely author. For qualitative and effective content analysis when determining the degree of authorship of a particular author, we propose to analyze the reference text and the one under consideration at several stages: linguometric analysis of the coefficients of the diversity of the author's speech, stylometric analysis, analysis of set expressions, linguo-statistical analysis of 3-grams. For automated text processing, not only the frequency of occurrence of a certain category, but also its existence in the studied text in general are important. Quantitative computation makes it possible to draw objective conclusions about the orientation of materials by the number of using the units of analysis in the studied texts. Qualitative analysis does the same, but as a result of the study of whether (and in what context) there is a certain important original category in general

Highlights

  • Due to the increasing availability and distribution of the text content in the Internet, the degree of importance of using automatic methods of text content analysis is increasing [1]

  • The tasks of content analysis include the problems of classification and clustering of text-based publications according to various criteria, for example, genre, epoch of Information technology writing, format, emotional coloration, style of speech, as well as the problem of text authorship attribution [2]

  • If we explore the Ukrainian language of the XX century, the general totality (GT) is all the texts of the XX century [3]

Read more

Summary

Introduction

Due to the increasing availability and distribution of the text content in the Internet, the degree of importance of using automatic methods of text content analysis is increasing [1]. The tasks of content analysis include the problems of classification and clustering of text-based publications according to various criteria, for example, genre, epoch of Information technology writing, format (a novel, an essay, and a scientific article), emotional coloration, style of speech, as well as the problem of text authorship attribution [2]. With the simplification of access to various data, expanding the ability of finding, copying and distributing information in the Internet, the problem of the authorship attribution is becoming relevant [3]. The concept of the authorship attribution is defined as the process of recognizing the author by a set of general and individual features of a text constituting the author’s style [6]

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call