Abstract

A formal approach was proposed to implement text content attribution. The study was conducted with Ukrainian scientific and technical texts. The results of application of the designed algorithms of automatic attribution of the text content based on the NLP and stylemetry methods were analyzed. Prospects and features of application of stylemetry information technologies for attribution of the text content were considered. Quantitative content analysis of scientific and technical text content takes advantage of content monitoring and text content analysis based on NLP, Web-Mining and stylemetry methods to identify the multitude of authors whose talking style is similar to that of the analyzed text fragment. This narrows the range of search for further use in the stylemetry methods to determine the degree of belonging of the analyzed text to a particular author. Decomposition of the attribution method was carried out based on analysis of such talking coefficients as lexical diversity, degree (measure) of syntactic complexity, talking coherence, indexes of exclusivity and concentration of the text. At the same time, author's style parameters such as the number of words in a certain text, the total number of words of this text, the number of sentences, the number of prepositions, the number of conjunctions, the number of words with occurrence frequency 1, the number of words with occurrence frequency 10 or more were analyzed. Further experimental study requires testing of the proposed method in identifying keywords of texts of other categories: scientific humanitarian, artistic, journalistic, etc.

Highlights

  • The scheme of combining methods of attribution of Ukrainian scientific and technical text content consists of lexical and syntactic levels [1]

  • Authorship identification is a technique for text attribution when it is questionable who wrote it [17]. It is useful when several people claim to be the authors of the same publication [18] or in cases where nobody claims to be the author of text content [19], for example, so-called trolls in social networks during information warfare [20]

  • – develop a content analysis software for attribution of Ukrainian texts based on stylistic analysis of coefficients of talking of the text content;

Read more

Summary

QUANTITATIVE METHOD FOR

AUTOMATIC ATTRIBUTION OF результати застосування розроблених алгоритмiв автоматичного визначення автора текстового контенту на основi методiв NLP та стилеметрiї.

Vysotska гу та контент-аналiзу тексту на основi
Introduction
Literature review and problem statement
The aim and objectives of the study
Method for determining a style of the text content
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call