Abstract

Explosively growing online text reports are mostly unstructured in nature. Many state-of-the-art techniques involving supervised, unsupervised or semi-supervised approaches have been developed in the recent years for automatic clustering of these reports. Annotation of online crime reports is a challenging task as various types of crime reports are frequently generated over time. To the best of the authors’ knowledge, this is the first attempt taken for group incremental adaptive clustering of crime reports integrating neural network and rough set theory. The proposed work initially identifies the named entities and selects only the context words within a pair of entities as a phrase. Thus every report is described by a collection of phrases. The phrases are vectorized using GloVe and a graph based clustering algorithm is applied to cluster all the collected phrases. The phrases within a cluster are considered as the similar type of phrases, called paraphrases and each report is represented by a binary vector of dimension equal to the number of clusters obtained. If a phrase of the report lies in a cluster then a ‘1’ is set at the corresponding position of the binary vector; otherwise it is set as ‘0’. Next, an adaptive resonance theory neural network is applied on the binary vector representation of the crime reports to generate a set of clusters of crime reports. When a new group of reports is available, the reports are transformed into binary form in the similar way and the rough set theory is applied on them. It puts many reports into existing clusters and for the remaining reports, adaptive resonance theory is further applied to modify the existing clusters and possibly generate the new clusters. Thus, in the dynamic environment when data are generated gradually over time, the proposed group incremental clustering algorithm is adapted to provide the updated set of clusters. The method has been applied on various crime report datasets and validated with the help of several cluster validation indices. The method is also compared with some state-of-the-art clustering algorithms to express its effectiveness and statistical significance in the domain of crime corpora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call