Abstract

The development of new technologies has caused computers one of the most popular electronic products. However, there is always a number of people who intend to take advantages of others through attacking others’ computers. To avoid property damage as much as possible, a precise and efficient detection is essential. This work uses the dataset which was generated by combining heartbeat and threat reports collected by Microsoft’ s endpoint protection solution to find out an effective solution. Since the dataset is large and has many categorical variables, reduction of memory and label encoding are used in data cleaning. Further, to handle the dimension problem and improve training efficiency, Chi-square testing is applied, and the top 42 fields are selected. Then, three algorithms (Logistic Regression, KNN and LightGBM) are chosen to build models and results are got respectively. The results show that LightGBM model achieves the best accuracy that AUC reaches 0.720687, and it is the most time-saving way. To the end, according to the feature importance from LightGBM algorithm, this work pick top-three important variables to analyze the underlying causes in the malware attack. One of the results reveals that the computer which has anti-virus software with bugs or pitfalls will suffer more attacks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call