Modelling temporal patterns in user behaviour

Sebastian Dungs

doi:10.17185/duepublico/70034

Abstract

Modelling sequential data is one of the most challenging problems in machine learning research. The object of the investigation can be records of user behaviour, which are analysed to uncover hidden temporal patterns. A broad range of solutions exist for this type of problem, including multi-space hidden Markov models (HMMs). The main strength of this technique is its ability to jointly model features on a discrete and continuous scale, which is a property that conventional HMMs do not possess; therefore, multi-space HMMs are well suited to model temporal patterns in combination with other features. However, so far,they have not been utilised to build temporal models of user behaviour. Based on a newly developed integrated framework for creating multi-space HMMs, user behaviour is modelled in two fields of research. By creating HMMs of two phases in user behaviour during a session search, prior qualitative information-seeking models are augmented by a quantitative component. In a series of experiments based on a search engine transaction log, it could be shown that approximately one out of three search sessions reached the second phase, which is characterised by heightened effectiveness and efficiency of user actions. Furthermore, how the search phase model can be used to estimate crucial parameters of a search session is demonstrated; for example, the expected time to find the next relevantdocument. In the second practical application, the HMM framework’s versatility is highlighted by utilising the models as a classifier to detect rumourous conversations on Twitter and to model their veracity. Thus, this work complements prior research by using tweet stance and time as the only features to build a high recall rumour detection system based on multi-space HMMs. Especially when modelling rumour veracity, the strength of the joint modelling of the temporal component is evident since the multi-space HMMs achieve state-of-the-art results. In further experiments, it is also shown that the models are robust to noise and can provide timely veracity classifications.

Full Text