ПРИМЕНЕНИЕ СИНТЕТИЧЕСКИХ ОБРАЗОВ ДЛЯ РЕШЕНИЯ ЗАДАЧИ КЛАССИФИКАЦИИ НА ПРИМЕРЕ ДИАГНОСТИКИ РАКА ЛЕГКОГО

Гундырев Иван Анатольевич ,Бельская Людмила Владимировна ,Косенок Виктор Константинович ,Сарф Елена Александровна

doi:10.15690/vramn73296-104-936

Гундырев Иван Анатольевич , Бельская Людмила Владимировна + Show 2 more

https://doi.org/10.15690/vramn73296-104-936

Copy DOI

Abstract

Background: From a mathematical point of view, the problems of medical diagnostics are the tasks of data classification. It is important to understand how significant distortions can contribute to the result of classification errors in the collection of primary diagnostic information, in particular, the results of biochemical tests. Aims: Determination of the dependence of the prediction result on the variability of the primary diagnostic information on the example of the model classifier. Materials and methods : The case-control study enrolled patients who were divided into 2 groups: the main (diagnosed with lung cancer, n=200) and the control group (conditionally healthy, n=500). Questioning and biochemical saliva study was performed in all participants. Patients of the main group and the comparison group were hospitalized for surgical treatment, after which carried out the histological verification of the diagnosis. The biochemical composition of saliva is determined spectrophotometrically. Based on the data obtained, a model classifier for the diagnosis of lung cancer (a random forest) has been constructed. In each parameter underlying the classifier, deviations were made in the specified range (±1–5%, ±5–10%, ±10–15%), creating synthetic images. Then, the results of the classification were evaluated by the cross-validation method. Results: The basic diagnostic characteristics of the model classifier are determined (sensitivity ― 72.5%, specificity ― 86.0%). As the deviations of synthetic images from the baseline increase, diagnostic characteristics deteriorate with the general classification. However, the result of a confident classification, on the contrary, gives higher values (sensitivity ― 81.8%, specificity ― 93.1%). In case of a confident classification, similar images that fall into different classes according to the classification results are deleted, whereas in the case of a general classification, they are taken into account. The difference between methods of classification is associated with the presence of images on which the classifier gives the result of belonging to the class in the range of 0.45–0.55. Therefore, it is necessary to introduce a third class into the classifier, the so-called gray zone (0.4–0.6), since the probability of making an erroneous diagnosis in this area is significantly increased. Conclusions: The obtained results allow to conclude that the measurement error in the range (±1–15%) does not significantly affect the quality of the classification.

Full Text