Abstract

In recent years, the number of malware and infected hosts has increased exponentially, which causes great losses to governments, enterprises, and individuals. However, traditional technologies are difficult to timely detect malware that has been deformed, confused, or modified since they usually detect hosts before being infected by malware. Host detection during malware infection can make up for their deficiency. Moreover, the infected host usually sends a connection request to the command and control (C&C) server using the HTTP protocol, which generates malicious external traffic. Thus, if the host is found to have malicious external traffic, the host may be a host infected by malware. Based on the background, this paper uses HTTP traffic combined with eXtreme Gradient Boosting (XGBoost) algorithm to detect infected hosts in order to improve detection efficiency and accuracy. The proposed approach uses a template automatic generation algorithm to generate feature templates for HTTP headers and uses XGBoost algorithm to distinguish between malicious traffic and normal traffic. We conduct a performance analysis to demonstrate that our approach is efficient using dataset, which includes malware traffic from MALWARE-TRAFFIC-ANALYSIS.NET and normal traffic from UNSW-NB 15. Experimental results show that the detection speed is about 1859 HTTP traffic per second, and the detection accuracy reaches 98.72%, and the false positive rate is less than 1%.

Highlights

  • With the booming of the Internet and the popularity of computers, today’s computers are facing serious security problems, whose biggest cause is the explosive growth of malicious code. e malicious code refers to a computer code that is intentionally written by individuals or organizations to pose a security risk to a computer or network

  • McAfee Labs records an average of eight new malware samples per second, a signi cant increase from the four new samples recorded in the third quarter [7]. e malware brings huge economic losses to users, and rapid changes have brought great trouble and pressure to the antikilling technology of malicious programs. e current technology has been difcult to detect malware before the host is infected

  • After an attacker attacks the host with malware, the controlled host sends a connection request to the command and control (C&C) server. e traffic generated by the connection is malicious external traffic

Read more

Summary

Introduction

With the booming of the Internet and the popularity of computers, today’s computers are facing serious security problems, whose biggest cause is the explosive growth of malicious code. e malicious code refers to a computer code that is intentionally written by individuals or organizations to pose a security risk to a computer or network. One is to filter malicious domain names based on blacklists, and the other is to use rules to match malicious external traffic. Both of these solutions have certain limitations. E blacklist-based filtering scheme can only identify malicious external traffic when connecting to a known malicious website and has no perception of domain name changes. Based on the feature detection scheme, it is necessary for the security practitioner to analyze the samples one by one, which consumes large manpower and is difficult to detect the malicious external connection traffic of the variant. (1) We propose an approach-combined machine learning and HTTP header template to discover traffic involved in malware infection and develop it into the MalDetector system.

Related Work
List of infected host
Request type
Word pos
Malicious traffic Normal traffic e number of HTTP request
Precision Detection rate
Normal traffic detection rate Malware traffic detection rate
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call