Improving misspelled word solving for human trafficking detection in online advertising data

Chawit Wiriyakun,Werasak Kurutach

doi:10.11591/ijece.v13i6.pp6558-6567

Abstract

<span lang="EN-US">Social media is used by pimps to advertise their businesses for adult services due to easy accessibility. This requires the potentially computational model for law enforcement authorities to facilitate a detection of human trafficking activities. The machine learning (ML) models used to detect these activities mostly rely on text classification and often omit the correction of misspelled words, resulting in the risk of predictions error. Therefore, an improvement data processing approach is one of strategies to enhance an efficiency of human trafficking detection. This paper presents a novel approach to solving spelling mistakes. The approach is designed to select misspelled words, the replace them with the popular words having the same meaning based on an estimation of the probability of words and context used in human trafficking advertisements. The applicability of the proposed approach was demonstrated with the labeled human trafficking dataset using three classification models: k-nearest neighbor (KNN), naive Bayes (NB), and multilayer perceptron (MLP). The achievement of higher accuracy of the model predictions using the proposed method evidences an improved alert on human trafficking outperforming than the others. The proposed approach shows the potential applicability to other datasets and domains from the online advertisements.</span>

Full Text