Abstract

An important problem in the area of homeland security is to identify suspicious entities in large datasets. Although there are methods from knowledge discovery and data mining (KDD) focusing on finding anomalies in numerical datasets, there has been little work aimed at discovering suspicious instances in large and complex semantic graphs whose nodes are richly connected with many different types of links. In this paper, we describe a novel, domain independent and unsupervised framework to identify such instances. Besides discovering suspicious instances, we believe that to complete the process, a system has to convince the users by providing understandable explanations for its findings. Therefore, in the second part of the paper we describe several explanation mechanisms to automatically generate human understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms the state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network by a large margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for its findings, allowed human subjects to perform complex data analysis in a much more efficient and accurate manner.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.