Impact of benign sample size on binary classification accuracy

Mamoru Mimura

doi:10.1016/j.eswa.2022.118630

Abstract

Recently, there has been a significant increase in malware attacks and malicious traffic. Consequently, several machine learning-based detection models have been developed to detect them. However, the detection accuracy of these models is currently evaluated using different methodologies and datasets, with some studies overstating high detection rates. The lack of a common testing approach coupled with the limited datasets used for the experiments make it challenging to compare the performances of these models to identify those that provide superior detection accuracy. A few studies have focused on benign samples and their effects on detection accuracy. The datasets used in the experiments generally consist of benign and malicious samples; hence, binary classification is used in the machine learning models. In the binary classification task, the size of a benign sample affects the classification accuracy of malicious samples, that is, it can either improve or degrade detection accuracy. In this study, we propose a novel metric for evaluating accuracy degradation by increasing benign sample size. We mainly used the FFRI dataset, which consists of 11,243 malware samples and 250,000 benign samples, and evaluated the classification accuracy with extracted strings from the malware. In addition, we obtained other malware samples that we used as supplementary to the main dataset. We increased the number of benign samples for testing by tenfold, while maintaining the malicious sample and benign training sample sizes, which resulted in a decrease of 0.293 in the F1 score. Furthermore, we confirmed that using a sufficiently sized benign training sample set mitigates accuracy degradation. Our metric can be beneficial for evaluating the benign sample size needed in binary classification and comparing accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Expert Systems With Applications	Publication Date: Aug 27, 2022
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Impact of benign sample size on binary classification accuracy

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Similar Papers

Detecting Electricity Fraud in the Net-Metering System Using Deep Learning
Mahmoud M Badr ... Mohamed Mahmoud
-
Mahmoud M Badr, et. al.Mahmoud M Badr ... Mohamed Mahmoud
31 Oct 2021
31 Oct 2021

An Improved Method of Detecting Macro Malware on an Imbalanced Dataset
Mamoru Mimura
IEEE Access | VOL. 8
Mamoru MimuraMamoru Mimura
01 Jan 2020
IEEE Access | VOL. 8

An analysis on breast tissue characterization in combined transform domain using nearest neighbor classifiers
B.N Prathibha ... V Sadasivam
-
B.N Prathibha, et. al.B.N Prathibha ... V Sadasivam
01 Mar 2011
01 Mar 2011

FlowMine: Android app analysis via data flow
Lovely Sinha ... Parvez Faruki
-
Lovely Sinha, et. al.Lovely Sinha ... Parvez Faruki
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Impact of benign sample size on binary classification accuracy

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications