Abstract

The evolving digital economy entails multifaceted behavioral tracking data such as internet clickstreams, location trajectories, or taste preferences revealed by music or video streaming. Organizations are increasingly interested in using such data streams to profile customers based on their behavioral similarities for targeting purposes. However, measuring similarities in sequential data is a challenging task. We present a generic deep neural-network-based framework for quantifying the similarity of ordered sequences in observed event histories. This novel approach combines a specific type of recurrent neural nets with a triplet loss cost function used for network training. It yields an embedding space that serves as a similarity metric for complex sequential data, can handle multivariate sequential data and incorporate covariates. We empirically validate the derived similarity metric for user embeddings in the domain of re-identifying users in web browsing histories. We demonstrate its superior performance in discriminating users based on their behavioral browsing patterns by benchmarking against more conventional approaches to measure sequence similarity. In addition, we show that the methodology can be used for clustering sub-sequences and re-classifying users based on their observed clickstream behavior. Finally, we critically reflect benefits and possible downsides of the proposed framework, discuss extensions and promising future applications. An open-source reference implementation can be obtained from github.com/vamosi/tl_rnn.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.