Experimental Evaluation of Malware Family Classification Methods from Sequential Information of TLS-Encrypted Traffic

Joonseo Ha,Heejun Roh

doi:10.3390/electronics10243180

Abstract

In parallel with the rapid adoption of transport layer security (TLS), malware has utilized the encrypted communication channel provided by TLS to hinder detection from network traffic. To this end, recent research efforts are directed toward malware detection and malware family classification for TLS-encrypted traffic. However, amongst their feature sets, the proposals to utilize the sequential information of each TLS session has not been properly evaluated, especially in the context of malware family classification. In this context, we propose a systematic framework to evaluate the state-of-the-art malware family classification methods for TLS-encrypted traffic in a controlled environment and discuss the advantages and limitations of the methods comprehensively. In particular, our experimental results for the 10 representations and classifier combinations show that the graph-based representation for the sequential information achieves better performance regardless of the evaluated classification algorithms. With our framework and findings, researchers can design better machine learning based classifiers.

Highlights

Shen et al [11] proposed the notion of the traffic interaction graph (TIG) to represent a packet length sequence with directions and introduced graph neural networkbased representation learning for distributed application classification, called GraphDApp
While a majority of recent malware detection and malware family classification methods utilize a subset of the enhanced flow features, which can be exported by network devices [20], collecting such features may be inefficient in some scenarios, especially when there is no careful feature selection (e.g., [34])
For the classification accuracy ranking, most of the results can be expected from existing research efforts [8,9,11], the efforts mainly focus on application classification and malware detection

Summary

Introduction

While the secure sockets layer (SSL), an encryption protocol designed for web applications, has been used with the broad adoption of the internet in the 1990s, the adoption of SSL and its successor transport layer security (TLS) was less than half of the web traffic until the mid 2010s [1]. While feature representation and learning for the classifiers are important issues in machine learning applications [12], existing research efforts in malware family classification rarely report the performance comparison among different feature representations and learning approaches To this end, in this article, we propose a systematic framework to evaluate malware family classification methods for TLS-encrypted traffic in a controlled environment. To evaluate the existing research efforts with different feature representation and learning fairly in a common environment, we utilize the framework to extract a common flow-level feature (i.e., flow length sequence and directions) from TLS-encrypted traffic and evaluate several malware family classification methods.

Backgrounds and Related Work

Early Encrypted Traffic Classification Methods

Exploiting Sequential Information of TLS Flow

Fine-Grained Classification for TLS-Encrypted Traffic in Mobile Apps

Malware Detection and Family Classification from TLS-Encrypted Traffic

Lack of Malware Family Dataset

Need for Evaluation Based on Packet Length Sequences

Framework Overview

Feature Representations

Classification Algorithms

Traffic Dataset

Accuracy and F1 Score of the State-of-the-Art Methods

Confusion Matrices without Noisy Labels

ROC Curves and AUC Values

Non-Parametric Friedman Test and Post-Hoc Nemenyi Test

Training Time and Testing Time

Performance Evaluation with Noisy Labels

Discussion

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Dec 20, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Experimental Evaluation of Malware Family Classification Methods from Sequential Information of TLS-Encrypted Traffic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Two-layer detection framework with a high accuracy and efficiency for a malware family over the TLS protocol.
Rongfeng Zheng ... Mehmet Hadi Gunes
PloS one | VOL. 15
Rongfeng Zheng, et. al.Rongfeng Zheng ... Mehmet Hadi Gunes
06 May 2020
PloS one | VOL. 15

Enhanced Android Malware Detection and Family Classification, using Conversation-level Network Traffic Features
Mohammad Abuthawabeh ... Khaled Mahmoud
The International Arab Journal of Information Technology | VOL. 17
Mohammad Abuthawabeh, et. al.Mohammad Abuthawabeh ... Khaled Mahmoud
31 Jul 2020
The International Arab Journal of Information Technology | VOL. 17

Predictive Eviction: A Novel Policy for Optimizing TLS Session Cache Performance
Ryan Stevens ... Hao Chen
-
Ryan Stevens, et. al.Ryan Stevens ... Hao Chen
01 Dec 2015
01 Dec 2015

Predictive Eviction: A Novel Policy for Optimizing TLS Session Cache Performance
Ryan Stevens ... Hao Chen
-
Ryan Stevens, et. al.Ryan Stevens ... Hao Chen
01 Dec 2014
01 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experimental Evaluation of Malware Family Classification Methods from Sequential Information of TLS-Encrypted Traffic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics