Supporting crime script analyses of scams with natural language processing

Zeya Lwin Tun,Daniel Birks

doi:10.1186/s40163-022-00177-w

Abstract

In recent years, internet connectivity and the ubiquitous use of digital devices have afforded a landscape of expanding opportunity for the proliferation of scams involving attempts to deceive individuals into giving away money or personal information. The impacts of these schemes on victims have shown to encompass social, psychological, emotional and economic harms. Consequently, there is a strong rationale to enhance our understanding of scams in order to devise ways in which they can be disrupted. One way to do so is through crime scripting, an analytical approach which seeks to characterise processes underpinning crime events. In this paper, we explore how Natural Language Processing (NLP) methods might be applied to support crime script analyses, in particular to extract insights into crime event sequences from large quantities of unstructured textual data in a scalable and efficient manner. To illustrate this, we apply NLP methods to a public dataset of victims’ stories of scams perpetrated in Singapore. We first explore approaches to automatically isolate scams with similar modus operandi using two distinct similarity measures. Subsequently, we use Term Frequency-Inverse Document Frequency (TF-IDF) to extract key terms in scam stories, which are then used to identify a temporal ordering of actions in ways that seek to characterise how a particular scam operates. Finally, by means of a case study, we demonstrate how the proposed methods are capable of leveraging the collective wisdom of multiple similar reports to identify a consensus in terms of likely crime event sequences, illustrating how NLP may in the future enable crime preventers to better harness unstructured free text data to better understand crime problems.

Full Text