Abstract

This paper focuses on applications of various machine learning techniques on an anonymized policing dataset used in EU SPIRIT Horizon 2020 project to identify fraudulent identities and help Law Enforcement Agencies (LEAs) in their investigation in finding potential criminals and identity resolution. Lack of qualitative data and appropriate methodology to carry out research on criminal fraudulent identities is a common reason for fewer research in this area. Additionally, it is a very sensitive data to work with and minor inaccuracy in prediction of result causes massive impact in the society as genuine people could be questioned whereas criminals could be sent free. Both of these issues are addressed in this paper by application of 39 million records from policing dataset and working towards higher accuracy while building the model. Various machine learning approaches are applied to train the dataset to make predictions and the research focus on being able to predict the 5 suspected fraudulent identities out of 39 million records in the policing dataset. One of the applied machine learning techniques include TensorFlow along with Keras model which has seldomly been applied by researchers in detection of criminal data. To compare the results and test accuracy of TensorFlow model, other machine learning techniques such as Support Vector Machine, Naïve Bayes and K-nearest Neighbours are also applied to have a comparative study on the obtained outcomes from each model. The goal of this research is to find fraudulent IDs amongst all the anonymized IDs in the criminal dataset using TensorFlow and three other machine learning models and select the most optimal model out of them. Since the model is comparing two names so string-matching techniques such as Levenshtein edit distance, Hamming Distance, Jaro-Winkler and Soundex were applied to select an effective approach first before building the model and analysing the results. TensorFlow model demonstrated highest accuracy with relatively least execution time and the only model to successfully predict all the 5 suspects from the policing dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.