Abstract

The article deals with the peculiarities of the markup of scientific and technical texts in developing a corpus of highly specialized texts. The scientific and technical texts as sources of filling the corpus are listed. The scientific and technical texts are analyzed from the position of markup of textual elements of different levels. The necessity of introducing interlevel types of markup of scientific and technical texts is substantiated. The significance of introducing structural markup when creating a corpus of scientific and technical texts is emphasized. The structural elements of scientific and technical texts for filling the corpus are listed. The current state of the problem of automatic extraction of terms from scientific and technical texts is analyzed. It is shown that the greatest difficulty is the marking of multicomponent terminological units in the corpus of scientific and technical texts. We identify literary terms as objects that require the development of additional tools for their processing, which may include various letters, symbols, numbers or their combinations. References as a factor influencing the classification and rubrication of scientific and technical texts are analyzed. The necessity of studying the types of references, as well as the ways of their automatic marking in the corpus of scientific and technical texts is substantiated. The necessity of introducing a separate marking of examples in scientific and technical texts is substantiated.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.