Abstract
Background: From a mathematical point of view, the problems of medical diagnostics are the tasks of data classification. It is important to understand how significant distortions can contribute to the result of classification errors in the collection of primary diagnostic information, in particular, the results of biochemical tests.Aims: Determination of the dependence of the prediction result on the variability of the primary diagnostic information on the example of the model classifier.Materials and methods: The case-control study enrolled patients who were divided into 2 groups: the main (diagnosed with lung cancer, n=200) and the control group (conditionally healthy, n=500). Questioning and biochemical saliva study was performed in all participants. Patients of the main group and the comparison group were hospitalized for surgical treatment, after which carried out the histological verification of the diagnosis. The biochemical composition of saliva is determined spectrophotometrically. Based on the data obtained, a model classifier for the diagnosis of lung cancer (a random forest) has been constructed. In each parameter underlying the classifier, deviations were made in the specified range (±1–5%, ±5–10%, ±10–15%), creating synthetic images. Then, the results of the classification were evaluated by the cross-validation method.Results: The basic diagnostic characteristics of the model classifier are determined (sensitivity ― 72.5%, specificity ― 86.0%). As the deviations of synthetic images from the baseline increase, diagnostic characteristics deteriorate with the general classification. However, the result of a confident classification, on the contrary, gives higher values (sensitivity ― 81.8%, specificity ― 93.1%). In case of a confident classification, similar images that fall into different classes according to the classification results are deleted, whereas in the case of a general classification, they are taken into account. The difference between methods of classification is associated with the presence of images on which the classifier gives the result of belonging to the class in the range of 0.45–0.55. Therefore, it is necessary to introduce a third class into the classifier, the so-called gray zone (0.4–0.6), since the probability of making an erroneous diagnosis in this area is significantly increased.Conclusions: The obtained results allow to conclude that the measurement error in the range (±1–15%) does not significantly affect the quality of the classification.
Highlights
From a mathematical point of view, the problems of medical diagnostics are the tasks of data classification
It is important to understand how significant distortions can contribute to the result of classification errors in the collection of primary diagnostic information, in particular, the results of biochemical tests
In each parameter underlying the classifier, deviations were made in the specified range (±1–5%, ±5–10%, ±10–15%), creating synthetic images
Summary
На основе полученных данных построен модельный классификатор для диагностики рака легкого (случайный лес). Разница между методами классификации связана с наличием образов, на которых классификатор дает результат принадлежности к классу в диапазоне 0,45–0,55. (Для цитирования: Гундырев И.А., Бельская Л.В., Косенок В.К., Сарф Е.А. Применение синтетических образов для решения задачи классификации на примере диагностики рака легкого. Для каждого базового образа (результатов анализов конкретного пациента) будем рассматривать образы с малыми отклонениями в каждом параметре как единый объект. В качестве удобной модели для построения классификатора выбран разработанный нами алгоритм диагностики рака легкого [11]. Данный алгоритм не соответствует клиническим рекомендациям по диагностике рака легкого, однако в его основе лежит многомерная статистическая обработка данных, что позволяет показать возможности применения синтетических образов для решения задач классификации, которые могут быть впоследствии использованы при работе с любым другим алгоритмом. Биохимические исследования осуществляли в лаборатории ООО «ХимСервис» (Омск, Российская Федерация)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have