The Effect on Network Flows-Based Features and Training Set Size on Malware Detection

Jarilyn M Hernandez Jimenez,Katerina Goseva-Popstojanova

doi:10.1109/nca.2018.8548325

Abstract

Although network flows have been used in areas such as network traffic analysis and botnet detection, not many works have used network flows-based features for malware detection. This paper is focused on malware detection based on using features extracted from the network traffic and system logs. We evaluated the performance of four supervised machine learning algorithms (i.e., J48, Random Forest, Naive Bayes, and PART) for malware detection and identified the best learner. Furthermore, we used feature selection based on information gain to identify the smallest number of features needed for classification. In addition, we experimented with training sets of different sizes. The main findings include: (1) Adding network flows-based features improved significantly the performance of malware detection. (2) J48 and PART were the best performing learners, with the highest F-score and G-score values. (3) Using J48, the top five features ranked by information gain attained the same performance as when using all 88 features. In the case of PART, the top fourteen features ranked by information gain led to the same performance as when all 88 features were used. None of the system logs-based features were included in these two models. (4) The classification performance when training on 75% of the data was comparable to training on 90% of the data. As little as 25% of the data can be used for training at an expense of somewhat higher, but not very significant performance degradation (i.e., less than 7% for F-score and 6% for G-score compared to when 90% of the data were used for training).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Effect on Network Flows-Based Features and Training Set Size on Malware Detection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Detection of Anomalous Zigbee Transmissions Using Machine Learning
Jarilyn M Hernandez Jimenez ... Hope Hong
-
Jarilyn M Hernandez Jimenez, et. al.Jarilyn M Hernandez Jimenez ... Hope Hong
29 Nov 2021
29 Nov 2021

Accuracy of Genomic Prediction in Dairy Cattle
Malena Erbe
-
Malena ErbeMalena Erbe
20 Feb 2022
20 Feb 2022

Malware Detection Using Power Consumption and Network Traffic Data
Jarilyn Hernandez Jimenez ... Katerina Goseva-Popstojanova
-
Jarilyn Hernandez Jimenez, et. al.Jarilyn Hernandez Jimenez ... Katerina Goseva-Popstojanova
01 Jun 2019
01 Jun 2019

MicroRNA signatures from FFPE tissue distinguish patients with colorectal cancer of UICC stage I from those with metastatic disease of stage IV.
T Mayr ... K Ridwelski
Journal of Clinical Oncology | VOL. 29
T Mayr, et. al.T Mayr ... K Ridwelski
20 May 2011
Journal of Clinical Oncology | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Effect on Network Flows-Based Features and Training Set Size on Malware Detection

Abstract

Talk to us

Similar Papers