The paper is focused on the pressing problem of speaker verification by means of voice time series comparison. The aim of this paper is to determine the orders of mel-frequency cepstral coefficients that most accurately describe the difference, between an authentic voice and an artificially generated copy for their further use as input to a neural network model in a resource-limited environment. To achieve this goal, the following tasks were accomplished: a conceptual model of the technology for determining the similarity threshold of two audio series was developed; the orders of fine-frequency cepstral coefficients with the most characteristic differences between the recording and the generated voice were determined on the basis of neural network analysis; an experimental study of the dependence of the execution time and computational load on the created feature vector when assessing the degree of similarity of two time series was conducted; and the optimal similarity threshold was determined on the basis of the chosen dataset. The developed model of the technology for determining the similarity threshold was tested on a dataset that is a combination of the DEEP-VOICE dataset and our own dataset. The demonstrated result of applying the developed technology showed an increase of 43% when using the specified MFCCs compared to using all of them. Based on experimental studies, the DTW acceptance threshold was set at 0.37.