Towards a framework for detecting advanced Web bots

Christos Iliou,Vasilis Katos,Theodora Tsikrika,Theodoros Kostoulas,Stefanos Vrochidis,Yiannis Kompatsiaris

doi:10.1145/3339252.3339267

Christos Iliou, Vasilis Katos + Show 4 more

Open Access

https://doi.org/10.1145/3339252.3339267

Copy DOI

Abstract

Automated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data extraction for commercial use or for malicious ones, including, but not limited to, content scraping, vulnerability scanning, account takeover, distributed denial of service attacks, marketing fraud, carding and spam. To ensure their security, Web servers try to identify bot sessions and apply special rules to them, such as throttling their requests or delivering different content. The methods currently used for the identification of bots are based either purely on rule-based bot detection techniques or a combination of rule-based and machine learning techniques. While current research has developed highly adequate methods for Web bot detection, these methods' adequacy when faced with Web bots that try to remain undetected hasn't been studied. For this reason, we created and evaluated a Web bot detection framework on its ability to detect conspicuous bots separately from its ability to detect advanced Web bots. We assessed the proposed framework performance using real HTTP traffic from a public Web server. Our experimental results show that the proposed framework has significant ability to detect Web bots that do not try to hide their bot identity using HTTP Web logs (balanced accuracy in a false-positive intolerant server > 95%). However, detecting advanced Web bots that present a browser fingerprint and may present a humanlike behaviour as well is considerably more difficult.

Highlights

The vast amount of content hosted on the Internet has rendered the use of Web bots necessary
The most famous techniques for Web bot detection are based on the CAPTCHA (i.e. Completely Automated Public Turing test to tell Computers and Humans Apart) [28] such as the reCAPTCHA2 offered by Google
The purpose of this paper is to identify the unique challenges that arise when state-of-the-art Web bot detection techniques are utilised for detecting advanced Web bots as opposed to simple bots

Summary

Introduction

The vast amount of content hosted on the Internet has rendered the use of Web bots necessary. Popular uses of Web bots include Web indexing, Website monitoring (validation of hyperlinks and HTML code), data extraction for commercial purposes and feed fetching Web content. To perform these actions, bots visit Web servers repeatedly and, in some cases, for a prolonged period of time [10]. Allowing bots unrestricted access to Web server content and services is not a good practice. The most famous techniques for Web bot detection are based on the CAPTCHA (i.e. Completely Automated Public Turing test to tell Computers and Humans Apart) [28] such as the reCAPTCHA2 offered by Google. The test uses the assumption that a human can extract letters from either a distorted image or the audio file or select an object in an image, while a Web bot cannot

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards a framework for detecting advanced Web bots

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Aug 26, 2019
Citations: 19	License type: cc-by

Similar Papers

Web Bot Detection Evasion Using Generative Adversarial Networks
Christos Iliou ... Theodora Tsikrika
-
Christos Iliou, et. al.Christos Iliou ... Theodora Tsikrika
26 Jul 2021
26 Jul 2021

Evaluating Machine Learning Techniques for Web Robot Detection
Jayan Sirikonda ... Mahdieh Zabihimayvan
Journal of Student Research | VOL. 13
Jayan Sirikonda, et. al.Jayan Sirikonda ... Mahdieh Zabihimayvan
31 May 2024
Journal of Student Research | VOL. 13

정확도 높은 검색 엔진을 위한 문서 수집 방법
Eun-Yong Ha ... Ho-Yeong Hwang
The KIPS Transactions:PartA | VOL. 10A
Eun-Yong Ha, et. al.Eun-Yong Ha ... Ho-Yeong Hwang
01 Oct 2003
The KIPS Transactions:PartA | VOL. 10A

New biostatistics features for detecting web bot activity on web applications
Rizwan Ur Rahman ... Deepak Singh Tomar
Computers & Security | VOL. 97
Rizwan Ur Rahman, et. al.Rizwan Ur Rahman ... Deepak Singh Tomar
12 Aug 2020
Computers & Security | VOL. 97

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards a framework for detecting advanced Web bots

Abstract

Highlights

Summary

Talk to us

Similar Papers