Abstract

Abstract Rapid developments in third-party online payment platforms now make it possible to record massive bank card transaction data. Clustering on such transaction data is of great importance for the analysis of merchant behaviours. However, traditional methods based on generated features inevitably lead to much loss of information. To make better use of bank card transaction data, this study investigates the possibility of using the empirical cumulative distribution of transaction amounts. As the distance between two merchants can be measured using the two-sample Kolmogorov–Smirnov test statistic, we propose the Kolmogorov–Smirnov K-means clustering approach based on this distance measure. An approximation step is conducted to ensure the feasibility of the proposed method even for large-scale transaction data, and the associated theoretical properties are investigated. Both simulations and an empirical study demonstrate that our method outperforms feature-based methods and is computationally efficient for large-scale data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call