Using XGBoost to Discover Infected Hosts Based on HTTP Traffic

Weina Niu,Heng Wu,Teng Hu,Tianyu Jiang,Xiaosong Zhang,Ting Li

doi:10.1155/2019/2182615

Abstract

In recent years, the number of malware and infected hosts has increased exponentially, which causes great losses to governments, enterprises, and individuals. However, traditional technologies are difficult to timely detect malware that has been deformed, confused, or modified since they usually detect hosts before being infected by malware. Host detection during malware infection can make up for their deficiency. Moreover, the infected host usually sends a connection request to the command and control (C&C) server using the HTTP protocol, which generates malicious external traffic. Thus, if the host is found to have malicious external traffic, the host may be a host infected by malware. Based on the background, this paper uses HTTP traffic combined with eXtreme Gradient Boosting (XGBoost) algorithm to detect infected hosts in order to improve detection efficiency and accuracy. The proposed approach uses a template automatic generation algorithm to generate feature templates for HTTP headers and uses XGBoost algorithm to distinguish between malicious traffic and normal traffic. We conduct a performance analysis to demonstrate that our approach is efficient using dataset, which includes malware traffic from MALWARE-TRAFFIC-ANALYSIS.NET and normal traffic from UNSW-NB 15. Experimental results show that the detection speed is about 1859 HTTP traffic per second, and the detection accuracy reaches 98.72%, and the false positive rate is less than 1%.

Highlights

With the booming of the Internet and the popularity of computers, today’s computers are facing serious security problems, whose biggest cause is the explosive growth of malicious code. e malicious code refers to a computer code that is intentionally written by individuals or organizations to pose a security risk to a computer or network
McAfee Labs records an average of eight new malware samples per second, a signi cant increase from the four new samples recorded in the third quarter [7]. e malware brings huge economic losses to users, and rapid changes have brought great trouble and pressure to the antikilling technology of malicious programs. e current technology has been difcult to detect malware before the host is infected
After an attacker attacks the host with malware, the controlled host sends a connection request to the command and control (C&C) server. e traffic generated by the connection is malicious external traffic

Summary

Introduction

With the booming of the Internet and the popularity of computers, today’s computers are facing serious security problems, whose biggest cause is the explosive growth of malicious code. e malicious code refers to a computer code that is intentionally written by individuals or organizations to pose a security risk to a computer or network. One is to filter malicious domain names based on blacklists, and the other is to use rules to match malicious external traffic. Both of these solutions have certain limitations. E blacklist-based filtering scheme can only identify malicious external traffic when connecting to a known malicious website and has no perception of domain name changes. Based on the feature detection scheme, it is necessary for the security practitioner to analyze the samples one by one, which consumes large manpower and is difficult to detect the malicious external connection traffic of the variant. (1) We propose an approach-combined machine learning and HTTP header template to discover traffic involved in malware infection and develop it into the MalDetector system.

Related Work

List of infected host

Request type

Word pos

Malicious traffic Normal traffic e number of HTTP request

Precision Detection rate

Normal traffic detection rate Malware traffic detection rate

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Security and Communication Networks	Publication Date: Nov 6, 2019
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using XGBoost to Discover Infected Hosts Based on HTTP Traffic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks

Lead the way for us

Similar Papers

Application of machine learning model based on XGBoost algorithm in early prediction of patients with acute severe pancreatitis
Xin Gao ... Jinzhou Zhu
Chinese critical care medicine | VOL. 35
Xin Gao, et. al.Xin Gao ... Jinzhou Zhu
01 Apr 2023
Chinese critical care medicine | VOL. 35

Diagnostic efficacy of a combined diagnostic model based on extreme gradient boosting algorithm in differentiating the pathological grading of gastric neuroendocrine neoplasms
...
National Medical Journal of China | VOL. 101
, et. al. ...
14 Sep 2021
National Medical Journal of China | VOL. 101

A hybrid multilayerperceptron-extremegradientboost approach for precise state of charge and state of health assessment
R Divya ... S.P Raja
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 8
R Divya, et. al.R Divya ... S.P Raja
11 May 2024
e-Prime - Advances in Electrical Engineering, Electronics and Energy | VOL. 8

Bias Correction of Tropical Cyclone Intensity for Ensemble Forecasts Using the XGBoost Method
Songjiang Feng ... Yanjie Li
Weather and forecasting | VOL. 39
Songjiang Feng, et. al.Songjiang Feng ... Yanjie Li
01 Feb 2024
Weather and forecasting | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using XGBoost to Discover Infected Hosts Based on HTTP Traffic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks