Abstract

Aiming at the problems of noise and irrelevant feature filtering in data processing, a feature selection method based on the stacking framework is proposed. Use the K-Fold cross-validation method to train and save DNN and SVM-based learners. The prediction results of the base learners are used as the input of the meta-learner, and the logistic regression learning model is trained and saved; comprehensively analyze the correlation coefficients of the fully connected neural network weight matrix and support vector machine, According to the learning results of the meta-learner model, assign different weights to each base learner, calculate the influence factors of each feature, and call the sequence backward search algorithm (SBS) to generate the optimal feature subset. In the experimental stage, a disease diagnosis model was constructed based on the open data set of heart disease research on the Kaggle website, and Stacking-SBS was called to generate the optimal feature subset in the feature space, and the performance comparison experiment of the diagnostic model before and after feature selection was performed, and the method was improved with information. (IG), Chi-square test (Chi) and correlation-based feature selection method (CFS) are compared. The results show that the application of this method can not only reduce model training time, but also significantly improve the model's recall rate and F1 value. In addition, this method is significantly better than the other three feature selection methods in terms of performance improvement. Finally, the open data set of cardiovascular research on the Kaggle website is used to verify the generalization ability of Stacking-SBS. The experimental results show that this method can also significantly improve the performance of the disease diagnosis model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call