Improved TrAdaBoost and its Application to Transaction Fraud Detection

Lutao Zheng,Guanjun Liu,Mengchu Zhou,Changjun Jiang,Chungang Yan,Maozhen Li

doi:10.1109/tcss.2020.3017013

Abstract

AdaBoost is a boosting-based machine learning method under the assumption that the data in training and testing sets have the same distribution and input feature space. It increases the weights of those instances that are wrongly classified in a training process. However, the assumption does not hold in many real-world data sets. Therefore, AdaBoost is extended to transfer AdaBoost (TrAdaBoost) that can effectively transfer knowledge from one domain to another. TrAdaBoost decreases the weights of those instances that belong to the source domain but are wrongly classified in a training process. It is more suitable for the case that data are of different distribution. Can it be improved for some special transfer scenarios, e.g., the data distribution changes slightly over time? We find that the distribution of credit card transaction data can change with the changes in the transaction behaviors of users, but the changes are slow most of the time. These changes are yet important for detecting transaction fraud since they result in a so-called concept drift problem. In order to make TrAdaBoost more suitable for the abovementioned case, we, thus, propose an improved TrAdaBoost (ITrAdaBoost) in this article. It updates (i.e., increases or decreases) the weight of a wrongly classified instance in a source domain according to the distribution distance from the instance to a target domain, and the calculation of distance is based on the theory of reproducing kernel Hilbert space. We do a series of experiments over five data sets, and the results illustrate the advantage of ITrAdaBoost.

Full Text