Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

Betul Altay,Tansel Dokeroglu,Ahmet Cosar

doi:10.1007/s00500-018-3066-4

Abstract

Conventional malicious webpage detection methods use blacklists in order to decide whether a webpage is malicious or not. The blacklists are generally maintained by third-party organizations. However, keeping a list of all malicious Web sites and updating this list regularly is not an easy task for the frequently changing and rapidly growing number of webpages on the web. In this study, we propose a novel context-sensitive and keyword density-based method for the classification of webpages by using three supervised machine learning techniques, support vector machine, maximum entropy, and extreme learning machine. Features (words) of webpages are obtained from HTML contents and information is extracted by using feature extraction methods: existence of words, keyword frequencies, and keyword density techniques. The performance of proposed machine learning models is evaluated by using a benchmark data set which consists of one hundred thousand webpages. Experimental results show that the proposed method can detect malicious webpages with an accuracy of 98.24%, which is a significant improvement compared to state-of-the-art approaches.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications

Lead the way for us

Journal: Soft Computing - A Fusion of Foundations, Methodologies and Applications	Publication Date: Feb 10, 2018
Citations: 30

Similar Papers

Combined electricity-heat-cooling-gas load forecasting model for integrated energy system based on multi-task learning and least square support vector machine
Zhongfu Tan ... Qinkun Tan
Journal of cleaner production | VOL. 248
Zhongfu Tan, et. al.Zhongfu Tan ... Qinkun Tan
12 Nov 2019
Journal of cleaner production | VOL. 248

Modified Red Blue Vegetation Index for Chlorophyll Estimation and Yield Prediction of Maize from Visible Images Captured by UAV.
Yahui Guo ... Christopher Robin Bryant
Sensors (Basel, Switzerland) | VOL. 20
Yahui Guo, et. al.Yahui Guo ... Christopher Robin Bryant
05 Sep 2020
Sensors (Basel, Switzerland) | VOL. 20

Two-Phase Malicious Web Page Detection Scheme Using Misuse and Anomaly Detection
Suyeon Yoo
The International Journal of Reliable Information and Assurance | VOL. 2
Suyeon YooSuyeon Yoo
30 Jun 2014
The International Journal of Reliable Information and Assurance | VOL. 2

Online Surface Defect Identification of Cold Rolled Strips Based on Local Binary Pattern and Extreme Learning Machine
Yang Liu ... Dadong Wang
Metals | VOL. 8
Yang Liu, et. al.Yang Liu ... Dadong Wang
20 Mar 2018
Metals | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications