Abstract

The scientific community is encouraged to use such models and data structures as arrays of LERP-RSA (the longest expected duplicate array of reduced suffix templates), tag classifier-a model based on Stanford NER's three-class, structures based on DN-sequences, graph representations, etc. The following algorithms are used: GreedyString-Tiling, ARPAD, shingle, statistical methods, genetic algorithms, and others. It should also be noted that much attention is paid to morphological analysis and lemmatization, pre-processing of texts. Models and algorithms only partly have program realization.The purpose of this work is to develop a text model to identify borrowings and bring it to program implementation. The task is to develop the object-oriented model and program implementation of a graph text model, with the application of the problem of detection of borrowing. As well as obtaining timeframes for program implementation work for further evaluation of the possibility of its use in the academic environment.The main idea of the graph model is to present the text as a weighted oriented graph. The vertex weight is a character or sequence of characters. Edge weight is the set of numbers of paths into which the edge enters. To formalize the model will use the apparatus of constructive-synthesizing modeling. To create graphs, a constructor and its components are defined: carrier, signature, multiple statements of information support for design. Transformations are made for the constructor: specialization, interpretation and concretization.On the basis of this model, the object-oriented model is constructed. it includes three classes: vertex, graph and work .The object of class Work presents the text as a set of objects of class Graph. The correspondences between the components of the presented models are established.The object-oriented model is implemented by software. Data are given about the execution time of graph construction and texts comparison.At this stage, software implementation of the model has shown acceptable time performance. Further research in this direction is promising. Directions for improving the model and program are proposed.

Highlights

  • Метою даної роботи є розробка моделі тексту для виявлення запозичень та доведення її до програмної реалізації.

  • Графова модель передбачає представлення тексту у вигляді орієнтованого навантаженого графу [12].

  • Де si , ~si – відношення підстановки для розпізнавання мовної конструкції і побудови конструкції графа відповідно, gi , g~i – операції над атрибутами мовної конструкції і графа, його вершин і дуг відповідно.

Read more

Summary

Introduction

Метою даної роботи є розробка моделі тексту для виявлення запозичень та доведення її до програмної реалізації. Графова модель передбачає представлення тексту у вигляді орієнтованого навантаженого графу [12]. Де si , ~si – відношення підстановки для розпізнавання мовної конструкції і побудови конструкції графа відповідно, gi , g~i – операції над атрибутами мовної конструкції і графа, його вершин і дуг відповідно. Правило для додавання першої вершини в граф має вигляд:

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.