In this study, we aim to predict customer purchase behavior using various machine learning models to better understand customer tendencies and enhance marketing strategies. We use a dataset containing demographic and behavioral data, including age, gender, annual income, number of purchases, product category, time spent on the website, loyalty program membership, and discounts availed. Our analysis involves data preprocessing, exploratory data analysis (EDA), and feature engineering. We then train and evaluate six different machine learning models: Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. The models are assessed using metrics such as accuracy, precision, recall, F1-score, and ROC AUC. Results indicate that ensemble models, specifically Random Forest and Gradient Boosting, outperform the other models in terms of accuracy and ROC AUC. The study concludes that ensemble models are highly effective for predicting customer purchase behavior, providing valuable insights for businesses to tailor their marketing efforts. Future research could explore additional features, more advanced models, and real-time prediction capabilities.
Read full abstract