Enron Email Corpus Research Articles

Co-clustering is a generalization of unsupervised clustering that has recently drawn renewed attention, driven by emerging data mining applications in diverse areas. Whereas clustering groups entire columns of a data matrix, co-clustering groups columns over select rows only, i.e., it simultaneously groups rows and columns. The concept generalizes to data “boxes” and higher-way tensors, for simultaneous grouping along multiple modes. Various co-clustering formulations have been proposed, but no workhorse analogous to <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i> -means has emerged. This paper starts from <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i> -means and shows how co-clustering can be formulated as a constrained multilinear decomposition with sparse latent factors. For three- and higher-way data, uniqueness of the multilinear decomposition implies that, unlike matrix co-clustering, it is possible to unravel a large number of possibly overlapping co-clusters. A basic multi-way co-clustering algorithm is proposed that exploits multilinearity using Lasso-type coordinate updates. Various line search schemes are then introduced to speed up convergence, and suitable modifications are proposed to deal with missing values. The imposition of latent sparsity pays a collateral dividend: it turns out that sequentially extracting one co-cluster at a time is almost optimal, hence the approach scales well for large datasets. The resulting algorithms are benchmarked against the state-of-art in pertinent simulations, and applied to measured data, including the ENRON e-mail corpus.

Read full abstract

The continued reliance on email communications ensures that it remains a major source of evidence during a digital investigation. Emails comprise both structured and unstructured data. Structured data provides qualitative information to the forensics examiner and is typically viewed through existing tools. Unstructured data is more complex as it comprises information associated with social networks, such as relationships within the network, identification of key actors and power relations, and there are currently no standardised tools for its forensic analysis. This paper posits a framework for the forensic investigation of email data. In particular, it focuses on the triage and analysis of unstructured data to identify key actors and relationships within an email network. This paper demonstrates the applicability of the approach by applying relevant stages of the framework to the Enron email corpus. The paper illustrates the advantage of triaging this data to identify (and discount) actors and potential sources of further evidence. It then applies social network analysis techniques to key actors within the data set. This paper posits that visualisation of unstructured data can greatly aid the examiner in their analysis of evidence discovered during an investigation.

Read full abstract

Enron Email Corpus Research Articles

Articles published on Enron Email Corpus

A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework

From $K$-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors

Dyad and Triad Census Analysis of Crisis Communication Network

From once upon a time to happily ever after: Tracking emotions in mail and books

A Framework for the Forensic Investigation of Unstructured Email Relationship Data

Statistical inference on attributed random graphs: Fusion of graph features and content: An experiment on time series of Enron graphs

Organizational Communication Networks and its Structural Changes Correlates to Organizational Disintegration

Using PLSI-U to detect insider threats by datamining e-mail

Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email

Using Author Topic to detect insider threats from email traffic

Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different”

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Enron Email Corpus Research Articles

Articles published on Enron Email Corpus

A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework

From $K$-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors

Dyad and Triad Census Analysis of Crisis Communication Network

From once upon a time to happily ever after: Tracking emotions in mail and books

A Framework for the Forensic Investigation of Unstructured Email Relationship Data

Statistical inference on attributed random graphs: Fusion of graph features and content: An experiment on time series of Enron graphs

Organizational Communication Networks and its Structural Changes Correlates to Organizational Disintegration

Using PLSI-U to detect insider threats by datamining e-mail

Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email

Using Author Topic to detect insider threats from email traffic

Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different”