Abstract
In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software’s, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.
Highlights
Web robots are autonomous agents used to browse the web in a mechanized and organized manner
If the session is robot session and it is classified as robot session, it is counted as a true positive (TP); if it is classified as human session, it is counted as a false negative (FN)
Second to evaluate whether session length can improve the accuracy of web robot detection
Summary
Web robots are autonomous agents used to browse the web in a mechanized and organized manner. They start their working with seed URLs lists and recursively visit hyperlinks accessible from that list. To sustain with the huge volumes of time-sensitive information, they must perform comprehensive searches, tender focused functionality and frequent visits to servers. These prospects have led to a spectacular augmentation in the number and types of robots, and their ferocity in visiting servers [1]. Scrapers used to automatically create copies of web sites for malicious purposes [2] Since their inception (First web robots were introduced in 1993) they are increasing exponentially.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Data Analysis and Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.