Abstract
Although a variety of techniques to detect malicious websites have been proposed, it becomes more and more difficult for those methods to provide a satisfying result nowadays. Many malicious websites can still escape detection with various Web spam techniques. In this paper, we first summarize three types of Web spam techniques used by malicious websites, such as redirection spam, hidden IFrame spam, and content hiding spam. We then present a new detection method that adopts the perspective of users and takes screenshots of malicious webpages to invalidate Web spams. The proposed detection method uses a Convolutional Neural Network, which is a class of deep neural networks, as a classification algorithm. In order to verify the effectiveness of the method, two different experiments have been conducted. First, the proposed method was tested based on a constructed complex dataset. We present comparison results between the proposed method and representative machine learning-based detection algorithms. Second, the proposed method was tested to detect malicious websites in a real-world Web environment for three months. These experimental results illustrate that the proposed method has a better performance and is applicable to a practical Web environment.
Highlights
The Internet has become an indispensable part of people’s life
EVALUATION In order to verify the effectiveness of the method, we conducted two kinds of experiments: one is conducted on a constructed complex dataset and the other is conducted in the real-world Web environment
1) CONSTRUCTED COMPLEX DATASET To test whether the proposed method is effective and practical, a complex data set is constructed in this paper, and its complexity is demonstrated as follows:
Summary
The Internet has become an indispensable part of people’s life. While the Internet brings prosperity, it is causing problems like illegal websites, fake medical websites, pornographic, gambling, etc. Despite the fact that various detection techniques were applied, the number of malicious websites continues to grow. The large amount of malicious information on the Internet is harmful to the health of Internet users, especially kids and teens [1], [2]. Researchers have come up with a lot of methods, including heuristic methods, machine learning based methods, and so on. Nowadays people usually use machine learning methods to analyze text and image information from websites but due to the huge temptation of profits, the malicious websites use a variety of Internet spam techniques to evade regulation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.