Abstract
Recent studies reported that about half of Web users nowadays are intelligent agents (Web bots). Many bots are impersonators operating at a very high sophistication level, trying to emulate navigational behaviors of legitimate users (humans). Moreover, bot technology continues to evolve which makes bot detection even harder. To deal with this problem, many advanced methods for differentiating bots from humans have been proposed, a large part of which relies on supervised machine learning techniques. In this paper, we propose a novel approach to identify various profiles of bots and humans which combines feature selection and unsupervised learning of HTTP-level traffic patterns to develop a user session classification model. Session clustering is performed with the agglomerative Information Bottleneck (aIB) algorithm, as well as with some other reference algorithms. The model is then used to classify new sessions to one of the profiles and to label the sessions as performed by bots or humans. An extensive experimental study, based on real server log data, demonstrates the ability of aIB clustering to distinguish user profiles and confirms high performance of the classification model in terms of accuracy, F1, recall, and precision.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.