Abstract

Malware diagnosis is one of today’s most popular topics of machine learning. Instead of simply applying all the classical classification algorithms to the problem and claim the highest accuracy as the result of prediction, which is the typical approach adopted by studies of this kind, we stick to the Support Vector Machine (SVM) classifier and based on our observation of some principles of learning, characteristics of statistics and the behavior of SVM, we employed a number of the potential preprocessing or ensemble methods including rescaling, bagging and clustering that may enhance the performance to the classical algorithm. We implemented the idea of rescaling by iteratively magnifying the attributes used by the support vectors of SVM and eliminating those unused ones from the training data examples until a maximum accuracy is achieved. Our study of bagging and clustering focused on the situation where only examples of malware are available and one-class SVM is used. For both methods, a group of models is built using part of the training data instead of building one model with the whole training data set. We also compared the effect of two possible coordination approaches for the sub-models acquired in the training process, namely, voting and one positive to be positive. Results of experiments showed that when utilized together with appropriate coordination methods, ensemble methods can effectively decrease both the cases where malware is labeled as clean or clean software is classified as malware, which are formally known as false-negative and false-positive errors in our context respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.