Abstract

The subject of the article is to determine the degree of scientific and technical text connectedness using statistical calculations. The aim of the scientific investigation is to study the possibilities of using the coherence of fluctuations in the relative frequencies of keywords in paragraphs to determine the lexical coherence and thematic unity of scientific and technical texts. The task is to develop a method for determining the thematic unity of a text at the set of paragraphs level; to develop a method for determining the coherence of a text at the set of paragraphs level; and to test the developed methods on a collection of documents. The methods used are statistical analysis and computational experiment methods. The following results were obtained. The study has shown that it is advisable to cluster paragraphs as points in the keyword space to determine the degree of scientific and technical text coherence at the level of paragraphs. This opens up the possibility of calculating the degree of thematic unity within the clusters and in the entire text. The degree of text fragments and the whole text coherence is determined by analyzing the sequence of paragraph numbers in the clusters. This makes it possible to formally determine the quality of the material presented in a scientific and technical article or in a textbook. Conclusions. The scientific novelty of the study is as follows: there was refined on the method for determination of the connectedness degree (coherence and thematic unity) of scientific and technical texts at the level of paragraphs by implementation of paragraphs clustering in the keywords space, using the calculation of thematic unity degree inside the clusters and in the overall text, as well as through analysis of paragraphs numbers sequence in clusters in order to determine the degree of text fragments and the overall text coherence. The methods are language-independent, based on clear hypotheses, and complement each other. The methods have an adjusting element that can be used to adapt it to different thematic and stylistic areas. It has been experimentally proved that the proposed methods for the determination of scientific and technical text connectedness are efficient and can provide the framework for information technology of content analysis of scientific and technical texts. The proposed methods do not use WEB resources for syntactic and semantic analysis, providing the possibility to use them autonomously.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call