A Graph Based Clustering Approach for Relation Extraction From Crime Data

Priyanka Das,Danilo Pelusi,Asit Kumar Das,Weiping Ding,Janmenjoy Nayak

doi:10.1109/access.2019.2929597

Abstract

Application of natural language processing techniques based on crime data can prove to be beneficial in several processes of the criminal justice industry. The availability of massive crime reports helps law enforcement agencies when a criminal investigation is launched. While investigating a crime, questions like what type of crime, who committed the crime, what happened at which place, on what time, and what actions are taken, keep arising. Now, it is not feasible for the law enforcement agencies to get into the detail of these available massive crime reports and get the answers. To tackle these problems associated with criminal justice industry, the proposed work considers a textual corpus containing information of crime against women in India and extracts substantial relations between the named entities present in the corpus by a hierarchical graph-based clustering technique. For extracting the relations, different types of entity pairs have been chosen and similarities among them have been measured based on the intermediate context words. Depending on the similarity score, a weighted graph has been formed and a similarity threshold is set to partition the graph based on the edge weights. With the iterative application of the clustering algorithm, all the named entity pairs are grouped into clusters, each of which signifies different crime aspects. Each cluster is characterized using the most frequent context word present in it. The proposed relation extraction scheme helps in crime pattern analysis that can aid in various criminal investigation requirements. The results with optimal cluster validation indices depict the effectiveness of this method.

Highlights

Textual information from forensic as well as criminal justice industries are increasing enormously and along with it, the data complexity has increased
A Python based site crawler has been designed to look through the aforementioned newspaper websites and search for terms related to crime like ‘rape’, ‘abduction’, ‘molest’ and many more
The present work demonstrates an unsupervised approach of extracting relations from newspapers based on criminological data

Summary

INTRODUCTION

Textual information from forensic as well as criminal justice industries are increasing enormously and along with it, the data complexity has increased. The unsupervised approach deals with identifying named entities from large corpus and extracts the existing relational phrase from the entities It helps in achieving useful information about the entities, and assists in further analysis of the text data for crime investigation. An unsupervised approach described in [9] used a named entity tagger for recognizing the entities present in ‘The New York Times (1995)’ newspaper corpus and the intervening context words of the entities have been hierarchically clustered for discovering the relations. The main objective of the proposed work is to discover the relation among the identified named entities by application of a topdown hierarchical graph based clustering technique.

PRELIMINARY CONCEPTS

EXPERIMENTAL RESULTS

CONCLUSION