Abstract
Modern applications of FinTech are challenged by enormous volumes of financial data. One way to handle these is to adopt a streaming setting where data are only available to the algorithms during a very short time. When a new data point (financial transaction) is generated, it needs to be processed directly, and be forgotten immediately after. Especially, ongoing globalization efforts in FinTech require modern methods of fault detection to be able to work efficiently through more than 10 000 financial transactions per second if they are to be deployed as a first line of defence. This article investigates two algorithms able to perform well in this demanding setting: $K$K-means and FADO. Especially, this article provides supports for the claim that “the use of multiple clusters does not necessarily translate into increased detection performance.” To support this claim, results are reported when operating in a quasi-realistic case study of Anti Money Laundering (AML) detection in real-time payment systems. We focus on two prototypical algorithms: the passive aggressive FADO assuming a single cluster, and the well-known $K$K-means algorithm working with $K>1$K>1 clusters. We find—in this case—that the use of $K$K-means with multiple clusters is unfavorable as 1) both tuning for $K$K, as well as the need for additional complexity in the $K$K-means algorithm challenges the computational constraints; 2) $K$K-means introduces necessarily added variability (unreliability) in the results; 3) it requires dimensionality reduction, compromising interpretability of the detections; 4) the prevalence of singleton clusters adds unreliability to the outcome. This makes in the presented case FADO favorable over K-means (with $K>1$K>1).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.