Due to the social awareness of risk control, we are witnessing the popularization of the insurance concept and the rapid development of financial insurance. The performance of the insurance industry is highly competitive; thus, in order to develop new and old business from existing clients, information on the renewal of client premiums, purchase of new policies, and new client referrals has become an important research topic in this field. However, based on a review of published literature, few scholars have engaged in relevant research on the above topics by data mining, which motivated the formation of this study, hoping to bridge this gap. We constructed 10 mixed classification prediction models (called Models A–J) using advanced data mining techniques. Moreover, 19 conditional attributes (coded as X1–X19) were selected from the collected insurance client database, plus three different decision attributes (coded as X20–X22): whether to pay the renewal insurance premium, whether to buy a new insurance policy, and whether to introduce new clients. In terms of technical methods, we used two data pretreatment techniques, attribute selection and data discretization, combined with different methods of disassembly in proportion and data cross-validation to conduct data analysis of the collected experimental data set. We also combined and calculated 23 important classification algorithms (or classifiers) in seven different classifications of data mining techniques (i.e., decision tree, Bayes, Function, Lazy, Meta, Mise, and Rule). In terms of the experimental results of insurance data, this study has the following important contributions and findings: (1) finding the best classifier; (2) finding the optimal mixed classification model; (3) determining the best disassembly in proportion; (4) comparing the performance of different disassembly in proportion and data cross-validation methods; (5) determining the important factors influencing the decision attribute “whether to purchase a new insurance policy”, including the time interval to the first purchase, the number of valid policies, the total number of purchased policies, the family salary structure, and gender; and (6) building a knowledge base of decision rules and criteria with the decision tree C4.5 technology, which shall be provided to relevant stakeholders such as insurance dealers and insurance salespeople as a reference for looking for valid clients in the future, and is conducive to the rapid expansion of insurance business. Finally, the important research findings and management implications of this study can serve as a basis for further study of sustainable insurance by academic researchers.
Read full abstract