The U.S. State Department Bureau of Counterterrorism officially lists 59 foreign terrorist organizations, while the current Terrorism Research & Analysis Consortium (TRAC) database contains over 3,800 groups. The number of actual groups is constantly changing as new groups emerge and existing groups are redefined, thus motivating a need to automate the rapid generation of multi-faceted group profiles to provide on-demand support for analyst understanding. Robust, automated profiles can be generated for these groups by leveraging current Natural Language Processing (NLP) techniques and large-scale analytics over relevant text (e.g., news stories, social media). Information on key individuals, attack history, group interactions, and more can be extracted and assembled into a dynamic organizational profile. Lockheed Martin Advanced Technology Labs (LM ATL) has developed a prototype system for creating such profiles, based on the publicly released Integrated Crisis Early Warning System (ICEWS) Coded Event Data, a set of over 13 million automatically generated events extracted from public news stories. This set of data has proven valuable for situational awareness and event forecasting, and a more actor-centric view of the data can yield rich details about a group's history and modus operandi. Profile generation, then, is based on the following capabilities: (1) event clustering, (2) event trending, and (3) narrative generation. In this paper, we describe both the framework and analytical components of the Group Profiling Automation for Crime and Terrorism (GPACT) prototype that generates terrorist and criminal group profiles. After describing the overall framework we focus on three analytical capabilities. First, event clustering operates over the set of event data to identify clusters of related events relevant to a particular topic of interest (e.g., interactions with other groups, past attack history), similar to how topic and document clustering operates. The second, event trend analysis, performs analytics over event data focusing on clustered topics to provide awareness of aggregate patterns detectable in the data. Third, narrative generation uses a template-based approach to natural language generation to construct a textual overview of the organization. Our results are analyzed, and ideas for potential future research identified.
Read full abstract