Abstract

The paper is devoted to the problem of numerically estimating the mutual semantic dependence of topical texts with respect to the most rational (i.e., standard) variants for describing the knowledge fragments they represent. The proximity of the text to the standard is evaluated without searching for paraphrases. This problem is relevant in determining the significance of information sources regarding tasks performed by the user. At this point, an example is the search for the optimal order of working with primary sources in the formation of the individual educational trajectory of a student. In the proposed solution, the basis for assessing the proximity of a text to the standard is the division of the words of each of its phrases into classes according to the value of the TF-IDF measure relative to the texts of the corpus, which was previously formed by an expert. The analyzed texts are the abstracts of scientific articles together with their titles. The principles of ranking and subsequent hierarchization of texts of an original collection based on the assessment variants relative to the title and phrase with the closest proximity to the standard are considered. The semantic images of the texts that are the closest to the standard are determined by the words with the highest TF-IDF values, which, when located next to each other in a linear row of a phrase, are most likely related by meaning and form key combinations together with the words that are close to the average value of the specified measure. An analysis of the occurrence of words with the highest TF-IDF values in different texts of the collection assesses the relationship of their standards as the basis for assessing the complementarity of texts in meaning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.