Abstract

This report intends to address the problem of predicting customers' willingness to be insured. To solve the problem, this report uses SMOTE over-sampling and Near Miss under-sampling to solve the data imbalance, establishes eight basic or ensemble models, including Logistic Regression, Decision Tree, etc., and compares the model strengths and weaknesses by using f1_score as a measure. The results of the models represent that the effects of over-sampling are better than under-sampling, and the results of the ensemble models are overall better than the basic models. The best method is over-sampling combined with Adaptive Boosting or Extreme Gradient Boosting. The highest f1_score among all the results is only 0.4, which means that all the methods mentioned in this report are limited in their ability to solve this problem. The methods for solving data imbalance, the prediction models, and the ensemble algorithms mentioned in the report are of high application value. This report expects the existence of models and methods that can significantly improve the prediction effects of this dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call