Background:Association rules encode common patterns and structures identified in datasets. They can be derived by association rule mining (ARM) algorithms. The association rules are human-readable and allow comprehensible predictions, unlike many other types of prediction algorithms. Objective:Classical ARM algorithms, like Apriori or FP-growth, cannot process interval or ratio scaled data (quantitative variables) which limits their applicability. Results:We address this restriction in classical ARM algorithms, making it possible to process quantitative variables on the right side of a rule. Our approach is based on applying the Kullback–Leibler divergence (KLD) to identify a rule which holistically considers complete data distributions instead of using only summary statistics.We demonstrate the new approach by using, among others, the example of predicting the length of stay of intensive care patients. The length of stay describes the number of days a patient spends in the intensive care unit. In addition, we further demonstrate our approach by predicting the credit score of bank customers and the contract duration of customers of a fictional telco company based on two publically available datasets. Conclusion:This paper shows a new approach for predicting quantitative variables in ARM. We demonstrate the new approach using the FP-growth algorithm.
Read full abstract