Abstract

Money laundering is a pervasive legal and economic problem that hides criminal activity. Identifying money laundering is a priority for both banks and governments, thus, machine learning algorithms have emerged as a possible strategy to detect suspicious financial activity within financial institutions. We used traditional regression and supervised machine learning techniques to identify bank customers at an increased risk of committing money laundering. Specifically, we assessed whether model performance differed across varying operationalizations of the outcome (e.g., multinomial vs. binary classification) and determined whether the inclusion of investigator-derived novel features (e.g., averages across existing features) could improve model performance. We received two proprietary datasets from Scotiabank, a large bank headquartered in Canada. The datasets included customer account information (N = 4,469) and customers’ monthly transaction histories (N = 2,827) from April 15, 2019 to April 15, 2020. We implemented traditional logistic regression, logistic regression with LASSO regularization (LASSO), K-nearest neighbours (KNN), and extreme gradient boosted models (XGBoost). Results indicated that traditional logistic regression with a binary outcome, conducted with investigator-derived novel features, performed the best with an F1 score of 0.79 and accuracy of 0.72. Models with a binary outcome had higher accuracy than the multinomial models, but the F1 scores yielded mixed results. For KNN and XGBoost, we observed little change or worsening performance after the introduction of the investigator-derived novel features. However, the investigator-derived novel features improved model performance for LASSO and traditional logistic regression. Our findings demonstrate that investigators should consider different operationalizations of the outcome, where possible, and include novel features derived from existing features to potentially improve the detection of customers at risk of committing money laundering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.