Abstract

The value of the sample distribution identification error of a multidimensional discrete random variable among a library of reference patterns is studied, depending on the dimension of the random vector, the sample length and the distance between two reference distributions in the norms C and L1. It is shown that the recognition error in the L1 norm is significantly lower than in C. Reference distributions of n-grams for texts are considered as a practical application. It turned out that the accuracy of identification is mainly determined by the individual characteristics of the standards, and not by the distances between them. An algorithm has been developed to test the system of standards for recognition accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call