Abstract

Cancer is one of the most fatal contributors towards the increasing mortality rate of mankind. This represents an important topic to study for the sake of the welfare of humanity. However, the traditional manual diagnosis and prognosis procedures of this disease are quite time-consuming, even for a professional medical practitioner. Thus, a model with robust power of predictions regarding the state of the tumour (i.e., probable cancer) would benefit most patients from the toxic side effects and additional medical services fees incurred by inessential treatment. To this end, the Logistic Regression Method is applied to derive a powerful model combining an algorithm from machine learning criteria – Learning Vector Quantization. There are two phases in building this model, phase 1 is the pretreatment of our data from Kaggle, including the process of normalization, classification and feature selection. From feature selection, 14 variables are extracted based on their level of importance. Thereby, models are built on these 14 variables and one output Y, consisting of 0 or 1, derived from the classification process. These 14 variables have a huge impact towards the prediction process since they significantly reduce the work needed for the procedure. Phase 2 is applying the relevant methodology to produce our model and examine its efficiency. To test the ability of the trained logistic models to recognize cancer, we analyzed residual samples that were not previously used for the training procedure and correctly classified them in all cases. The evaluation of the model combines methods of the AUC-ROC curve as well as the confusion matrix, which are powerful statistical approaches. The AUC value after calculation is , which strengthens the validity and efficiency of the model. Besides, the confusion matrix reveals an accuracy of 0.9787 (out of 1). The repercussions of this model can be utilized in the field of forecasting the probability of cancer from concrete measurements of the tumor. This may refrain from the exorbitant expenditure on the usage of certain delicate medical machines, like X-ray. Moreover, this provides foundation statistics for the application of modern AI technology in the cancer prediction region.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call