Abstract

Much of modern medical research are supported by statistics and machine learning techniques. For the reliability of these studies, it is important to measure the effectiveness of the developed models in making predictions on real medical data. Cross validation is a tool, which provides a principled method of measuring the effectiveness of models and comparing models with each other. This paper presents theoretical and empirical analysis of common cross validation methods. Using sample logistic regression model along with six different validation methods, comparative analysis has been conducted. The model was developed to determine the level of thyroglobulin hormone, thyroid cancer marker. The marker indicates an increased risk of thyroid cancer metastasis, and is commonly used in metastasis diagnosis. Whereas, literature does not provide a single threshold of thyroglobulin value, indicating an increased risk of metastasis. This paper confirms that one of the main reasons of this lack, is the high variance of the developed models. This research argues that the use of the validation technique also influences both: measures of the model quality and the threshold value. The results show a high discrepancy in the determination of threshold values. However, iterative methods (such as sampling and bootstrap) seem to produce more stable outcomes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call