Digital (forensic) investigations will be increasingly important in both criminal investigations and civil litigations (e.g., corporate espionage, and intellectual property theft) as more of our communications take place over cyberspace (e.g., e-mail and social media platforms). In this paper, we present our proposed Natural Language Processing (NLP)-based digital investigation platform. The platform comprises the data collection and representation phase, the vectorization phase, the feature selection phase, and the classifier generation and evaluation phase. We then demonstrate the potential of our proposed approach using a real-world dataset, whose findings indicate that it outperforms two other competing approaches, namely: LogAnalysis (published in Expert Systems with Applications, 2014) and SIIMCO (published in IEEE Transactions on Information Forensics and Security, 2016). Specifically, our proposed approach achieves 0.65 in F1-score and 0.83 in precision, whilst LogAnalysis and SIIMCO respectively achieve 0.51 and 0.59 in F1-score and 0.49 and 0.58 in precision.
Read full abstract