Abstract

The prevalence of the Internet and cloud-based applications, alongside the technological evolution of smartphones, tablets and smartwatches, has resulted in users relying upon network connectivity more than ever before. This results in an increasingly voluminous footprint with respect to the network traffic that is created as a consequence. For network forensic examiners, this traffic represents a vital source of independent evidence in an environment where anti-forensics is increasingly challenging the validity of computer-based forensics. Performing network forensics today largely focuses upon an analysis based upon the Internet Protocol (IP) address – as this is the only characteristic available. More typically, however, investigators are not actually interested in the IP address but rather the associated user (whose account might have been compromised). However, given the range of devices (e.g., laptop, mobile, and tablet) that a user might be using and the widespread use of DHCP, IP is not a reliable and consistent means of understanding the traffic from a user. This paper presents a novel approach to the identification of users from network traffic using only the meta-data of the traffic (i.e. rather than payload) and the creation of application-level user interactions, which are proven to provide a far richer discriminatory feature set to enable more reliable identity verification. A study involving data collected from 46 users over a two-month period generated over 112 GBs of meta-data traffic was undertaken to examine the novel user-interaction based feature extraction algorithm. On an individual application basis, the approach can achieve recognition rates of 90%, with some users experiencing recognition performance of 100%. The consequence of this recognition is an enormous reduction in the volume of traffic an investigator has to analyse, allowing them to focus upon a particular suspect or enabling them to disregard traffic and focus upon what is left.

Highlights

  • During the past 15 years, Internet usage has experienced explosive growth and technological evolution – from a simple data network with around 500 million users to a multipurpose and multiservice platform with almost 3.2 billion users (Internetlivestats, 2015)

  • To provide scientific rigour and statistical reliability, the following criteria were established: (a) The dataset must contain a sufficient number of participants to provide a basis for identifying them; (b) The dataset must contain sufficient samples across a prolonged period in order to ensure identification performance can be maintained; (c) All network traffic meta-data from all participants is to be collected; (d) The Internet Protocol (IP) address and user must be fixed for the complete duration in order to provide a ground truth to which to label the interactions and calculate the performance

  • This paper has presented and evaluated a novel feature extraction approach for network traffic that provides robust user identification

Read more

Summary

Introduction

During the past 15 years, Internet usage has experienced explosive growth and technological evolution – from a simple data network with around 500 million users to a multipurpose and multiservice platform with almost 3.2 billion users (Internetlivestats, 2015). Studies into behavioural profiling on desktop and mobile platforms have demonstrated the ability to verify an individual; deriving application-level interactions (such as which websites users visit and more importantly what they do whilst visiting – posting, chatting, listening to music or watching video) from low-level encrypted packet-based data has proven challenging. Using these application-based interactions for identification rather than verification introduces a need for stronger discriminative information.

Prior art in network and behavioural profiling
Packet based network analysis method
Flow based network analysis approach
Biometric-based behavioural profiling
Deriving user interactions from network metadata
Network data collection dataset
Data collection
Data pre-processing
User identification via network interactions
Preliminary experiment: classification configuration
Experiment
Findings
Discussion
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call