Improving the performance of Machine Learning Algorithms for TOR detection

Adityan Gurunarayanan,Ashutosh Bhatia,Deepak Kumar Vishwakarma,Ankit Agrawal

doi:10.1109/icoin50884.2021.9333989

Abstract

The Onion Router (TOR) networks provide anonymity, in terms of identity and location, to the Internet users by encrypting traffic multiple times along the path and routing it via an overlay network of servers. Although TOR was initially developed as a medium to maintain users' privacy, cyber criminals and hackers take advantage of this anonymity, and as a result, many illegal activities are carried out using TOR networks. With the ever-changing landscape of Internet services, traditional traffic analysis methods are not efficient for analyzing encrypted traffic and there is a need for alternative methods for analyzing TOR traffic. In this paper, we develop a machine learning model to identify whether a given network traffic is TOR or nonTOR. We use the ISCX2016 TOR-nonTOR dataset to train our model and perform random oversampling and random undersampling to remove data imbalance. Furthermore, to improve the efficiency of our classifiers, we use k-fold cross-validation and Grid Search algorithms for hyperparameter tuning. Results show that we achieve more than 90% accuracy with random sampling and hyperparameter tuning methods.

Full Text