Using Visual Analytics to Discover Bot Traffic

Ih Herlambang

doi:10.24377/ljmu.t.00004593

Abstract

With the advance of technology, the Internet has become a medium tool used for many malicious activities. The presence of bot traffic has increased greatly that causes significant problems for businesses and organisations, such as spam bots, scraper bots, distributed denial of service bots and adaptive bots that aim to exploit the vulnerabilities of a website. Discriminating bot traffic against legitimate flash crowds remains an open challenge to date.In order to address the above issues and enhance security awareness, this thesis proposes an interactive visual analytics system for discovering bot traffic. The system provides an interactive visualisation, with details on demand capabilities, which enables knowledge discovery from very large datasets. It enables an analyst to understand comprehensive details without being constrained by large datasets. The system has a dashboard view to represent legitimate and bot traffic by adopting Quadtree data structure and Voronoi diagrams. The main contribution of this thesis is a novel visual analytics system that is capable of discovering bot traffic.This research conducted a literature review in order to gain systematic understanding of the research area. Furthermore, the research was conducted by utilising experiment and simulation approaches. The experiment was conducted by capturing website traffic, identifying browser fingerprints, simulating bot attacks and analysing mouse dynamics, such as movements and events, of participants. Data were captured as the participants performed a list of tasks, such as responding to the banner. The data collection is transparent to the participants and only requires JavaScript to be activated on the client side. This study involved 10 participants who are familiar with the Internet. To analyse the data, Weka 3.6.10 was used to perform classification based on a training dataset. The test dataset of all participants was evaluated using a built-in decision tree algorithm. The results of classifying the test dataset were promising, and the model was able to identify ten participants and six simulated bot attacks with an accuracy of 86.67%. Finally, the visual analytics design was formulated in order to assist an analyst to discover bot presence.

Full Text