Abstract

Depending on the ambiguity and synonymy at all language levels, the identification of semantic similarity of linguistic units is the challenging task of semantic analysis of text information contained in news reports. The extraction of semantically similar fragments of texts or paraphrases is an up-to-date problem in fields of science such as semantic information retrieval, information extraction, machine translation, detection of copyright infringements, etc. and is widely used in rewriting. The article analyzes the main problems of rewriting, in particular, the paraphrasing of syntactic text units keeping the sense load. The modern methods for identification of semantic similarity of words, their advantages and disadvantages are considered. Based on the use of WordNet and the developed syntactic rules that store information about the grammatical characteristics of words, a method for automatic identification of synonymous fragments of news texts is proposed. The advantage of this method is that both the grammatical structure of the language and the meaning of words (using WordNet) are analyzed. The experimental corpus is represented by news texts from Reuters news agency, BBC World News and CNN services. The proposed method for identifying semantically similar text fragments allows defining the common information space of current news and can be used to effectively identify related texts in information retrieval, expert, analytical information and rewriting systems. The automatic identification of semantic similarity could be implemented in automated construction of ontologies, in expansion of existing and creation of new thesauri.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.