Abstract

With the rapid development of web-based applications, clicking on hyperlinks has become a general means for accessing various network services. Understanding the visiting behavior of web users not only helps improve the personalized service quality and user experience, but also plays an important role in network management and early threat detection. Click-stream identification is a fundamental issue for user behavior analysis. However, most existing approaches are designed for non-encrypted HTTP requests and only focus on server-side scenarios, which makes them inapplicable to the increasingly popular HTTPS and network-side management. In this work, we propose an encryption-independent scheme from a network-side perspective that adopts the web traffic collected at the network boundary to identify the HTTP(S) requests generated by the click actions of web users. The proposed scheme employs hidden Markov models (HMMs) to describe the time-varying behavior of click and non-click web traffic. A deep neural network (DNN) is integrated into the HMMs to capture the context of web traffic, which eliminates the limitations caused by the independence hypothesis of the traditional HMMs. Finally, a DNN-based rear classifier is proposed to determine the type of HTTP(S) requests according to the fitting degree between the HTTP(S) requests and the HMM-based behavior models. We derive the algorithms for model learning and click identification. Experiments are conducted to validate the proposed approach. Performance-related issues and comparisons are discussed. Results show that both the average precision and recall rate of the proposed approach exceed 92%, which is better than most existing benchmark methods in terms of performance and stability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call