Abstract
Crime refers to an action legally defined as harmful to society, and it is important to understand the type of crime to prevent these actions. However, crime can occur at any time and place, making it difficult to predict. Data generated based on previously committed crimes contributes to overcoming this difficulty. This study proposes a novel model for classifying criminal activities using a Doc2Vec that can cause a numerical representation of texts regardless of length and a stacking ensemble model that includes 8 different machine-learning models. Unlike the literature, the model processes the features as text and converts them into vectors rather than categorically. In this way, it enables using features that cannot be used in the literature. The proposed model is tested using a distributed online competition database, Francisco Crime Classification, which contains crimes committed over 12 years. An accuracy value of 99.28% was obtained for the 15 crime categories with the highest crime records, while precision, recall, and f-score values were 99.18%, 99.38%, and 99.20%, respectively. With cross-validation (k=10), 99.80% performance was achieved with a std. value of 0.001. These performance values are higher than those of all the studies in the literature using categorical feature structures. The results show that converting criminal activity reports, which contain text-based features, into vectors that can be processed with natural language processing techniques such as Doc2vec instead of using them categorically in model training can directly contribute to the classification performance and provide a more efficient model with less preprocessing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Çukurova Üniversitesi Mühendislik Fakültesi Dergisi
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.