Abstract

For artificial languages formed by many sources of big data, in the Markov alphabet, a prescription is investigated that provides linkage between the word and the source, through the consideration of phonemes, morphemes, morphs and allomorphs in this language. The artificial language is studied in the context of solving the problem of synthesizing a methodology for classifying the risks of laundering proceeds from crime and financing terrorism in big data of organizational systems. New provisions of the theory of Markov algorithms are being developed. Methods of data decomposition using the N-scheme of the Markov algorithm, developed to replace his well-known g-scheme, are studied. A method for decomposing words into phonemes in the Markov alphabet is considered. It has been established that Markov phonemes are formed on the basis of terminal symbols of the alphabet M = litjпо1…оx /abdgckm for big data input samples and the alphabet MK = litj/abdgckm for the control channel. The method of synthesizing morphemes from Markov phonemes is studied from the positions of both the theory of algorithms and linguistics. The properties of roots and affixes of morphemes are considered from the standpoint of the theory of algorithms, in particular, in the N-scheme of the Markov algorithm. A scientific contradiction in the linguistic approach to the formation of morphemes is revealed, a way to overcome it is proposed. The results of the theoretical substantiation of the method of determining (assigning) for morphemes morphs and allomorphs within the framework of the theory of algorithms are given. New properties of Markov phonemes in the Markov alphabets AM±2 and KMK±2 not previously seen in linguistics are established. Some scientific conclusions previously obtained in linguistics are confirmed from the position of the theory of algorithms. A generalized, reference model of a morpheme has been developed, its abstract nature has been confirmed, methods for synthesizing morphs and allomorphs in a categorical representation format have been given. The statement is synthesized in the form of a Markov diagram of occurrences, its comprehensive analysis is carried out. It is concluded from the above research results that the decomposition of big data into phonemes and morphs provides the representation of statements in the form of chains of occurrences that can be recoded into a categorical presentation format, and by analyzing numbers-types and numbers of morphisms, it is possible to extract their signatures from morphs, which are allomorphs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call