Abstract

The article is devoted to the problem of numerical evaluation of the proximity of a thematic text to the most rational (reference) language version of the description of the piece of knowledge it represents. This problem is relevant for the implementation of targeted selection of textual information without losing the useful semantic component. Examples of practical applications here can be the selection of articles for publication in scientific journals, as well as the development of training courses and educational portals. In the proposed solution, the basis for assessing the proximity of a text to a semantic pattern (i.e. sense standard) is the division of the words of each phrase into classes according to the value of the TF-IDF measure relative to the texts of the corpus pre-formed by an expert. The analyzed texts considered in the paper are the abstracts of scientific articles along with their titles. At the same time, the semantic images of the texts closest to the standard determine the words with the highest TF-IDF values, which, being neighbors in a linear series, are most likely related by meaning and form key combinations. The proposed numerical estimate of the proximity to the standard makes it possible to rank articles according to the significance of the described fragments of knowledge with respect to a given subject area, as well as to the non-redundancy of the description itself.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.