Abstract

Nowadays web surfing is an integral part of the life of the average person and everyone would like to protect his own data from thieves and malicious web pages. Therefore, this paper proposes a solution to the discrimination of malicious and benign websites problem with desirable accuracy. We propose to utilize machine learning methods for classification malicious and benign websites based on URL and other host-based features. State-of-the-art gradient-boosted decision trees are proposed to use for this task and they have been compared with well-known machine learning methods as random forest and multilayer perceptron. It was shown that all machine learning methods provided desirable accuracy which is higher than 95% for solving this problem and proposed gradient-boosted decision trees outperforms random forest and neural network approach in this case in terms of both overall accuracy and f1-score.

Highlights

  • Introduction sourceLocator (URL) detection [5], [6]

  • In 2010, the population of Internet users this paper, we propose machine learning methods for was about two billion [1] and at the end of June 2019, classification websites on malicious and benign based the population of Internet users reached more than 4.5 on URL itself and utilizing additional billion [2]

  • WHOIS_STATEPRO: it is a categorical variable, machine learning techniques for malicious Uniform Reits values are the states we got from the server

Read more

Summary

Introduction

Introduction sourceLocator (URL) detection [5], [6]. For example, in [6], [7] the authors used only URL information forThe popularity of the Internet grows every year and features extraction by machine learning approaches. In 2010, the population of Internet users this paper, we propose machine learning methods for was about two billion [1] and at the end of June 2019, classification websites on malicious and benign based the population of Internet users reached more than 4.5 on URL itself and utilizing additional billion [2]. These standard approaches its values are the countries we got from the server have issues in case of observing new attacks due to response

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.