This research presents a comprehensive approach to customer segmentation using Recency, Frequency, and Monetary (RFM) analysis, combining statistical insights, data visualization, and machine learning techniques. The study utilizes a real-world dataset obtained from a retail environment, aiming to categorize customers based on their recent purchasing behavior, visit frequency, and monetary contributions to the store. The code begins with data preparation and exploration, ensuring data integrity by addressing issues such as negative quantities and missing customer identifiers. Following this, the Recency, Frequency, and Monetary metrics are computed, providing a holistic view of customer engagement and spending patterns. Visualizations, including violin plots, histograms, and box plots, are employed to intuitively convey the distribution of these metrics. The research then delves into the quantile-based segmentation of customers, allowing for a more granular classification. Quantiles are calculated to divide customers into four segments for each RFM metric. The resulting quantile labels are applied to the dataset, enabling the creation of a compound RFM quantile that combines recency, frequency, and monetary information. This combined quantile facilitates the definition of distinct customer segments. To further enhance the interpretability of customer segments, the study introduces a set of rules for labeling customers based on their RFM quantiles. These rules yield segments such as "Best Customer," "Loyal Customer," "Big Spender," "Dead Beats," and "Lost Customer." The resulting customer segmentation is presented visually through histograms and a pie chart, providing a clear and concise representation of the distribution of customers across different segments. Moreover, the research integrates machine learning models, including XGBClassifier, and CatBoostClassifier, to explore the potential of automating the segmentation process and predicting customer segments based on historical data. However, the machine learning aspect is introduced with commented-out sections, leaving room for further exploration and experimentation. In conclusion, this research contributes a comprehensive and detailed code implementation for RFM-based customer segmentation. The integration of visualization techniques aids in the interpretation of customer behavior, while the inclusion of machine learning models opens avenues for predictive analytics in customer segmentation. The presented approach provides valuable insights for businesses seeking to tailor marketing and customer relationship strategies based on individualized customer segments. KEYWORDS— Customer Segmentation, Frequency, Monetary Value, Recency, RFM Analysis.
Read full abstract