Abstract

Consumer preference prediction aims to predict consumers’ future purchases based on their historical behavior-level data. Using machine learning algorithms, the prediction results provide evidence to conduct commercial activities and further improve consumer experiences. However, missing values and imbalanced class problems of consumer behavioral data always make machine learning algorithms ineffective. While several methods have been proposed to address missing data or imbalanced class problems, few works have considered the relationships among missing mechanisms, imputation algorithms, imbalanced class methods, and the effectiveness of classification algorithms that use impute data. In this study, we aim to propose an adaptive process for selecting the optimal combination of amputation, imputation, imbalance treatment, and classification based on classification performance. Our research extends the literature by showing significant interaction effects between 1) the amputation mechanism and imputation algorithms, 2) imputation and imbalance treatments, and 3) imbalance treatments and classification algorithms. Using three consumer behavioral datasets from the UCI Machine Learning Repository, we empirically show that, among different classification methods, the overall performance of Random Forest is better than that of Logit, SVM, or Decision Tree. Moreover, Logit, as the most widely used classification method, suffers most from imbalance issues in real-world datasets. Furthermore, Metacost is always the best imbalance treatment for different imputation techniques or missing value mechanisms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.