Abstract

In order to improve the classification effect of the logistic regression (LR) model for breast cancer prediction, a new hybrid feature selection method is proposed to process the data, using the Pearson correlation test and the iterative random forest algorithm based on out-of-bag estimation (RF-OOB) to screen the optimal 17 features as inputs to the model. Secondly, the LR is optimized using the batch gradient descent (BGD-LR) algorithm to train the loss function of the model to minimize the loss. In order to protect the privacy of breast cancer patients, a differential privacy protection technology is added to the BGD-LR model, and an LR optimization model based on differential privacy with batch gradient descent (BDP-LR) is constructed. Finally, experiments are carried out on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Meanwhile, accuracy, precision, recall, and F1-score are selected as the four main evaluation indicators. Moreover, the hyperparameters of each model are determined by the grid search method and the cross-validation method. The experimental results show that after hybrid feature selection, the optimal results of the four main evaluation indicators of the BGD-LR model are 0.9912, 1, 0.9886, and 0.9943, in which the accuracy, recall, and F1-scores are increased by 2.63%, 3.41%, and 1.76%, respectively. For the BDP-LR model, when the privacy budget ε is taken as 0.8, the classification performance and privacy protection effect of the model reach an effective balance. At the same time, the four main evaluation indicators of the model are 0.9721, 0.9975, 0.9664, and 0.9816, which are improved by 1.58%, 0.26%, 1.81%, and 1.07%, respectively. Comparative analysis shows that the models of BGD-LR and BDP-LR constructed in this paper perform better than other classification models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call