Improving hidden Markov models with a similarity histogram for typing pattern biometrics

Weide Chang Weide Chang

doi:10.1109/iri-05.2005.1506521

Abstract

A highly feasible user-authentication biometric is to examine and identify the typing patterns exhibited among people in order to authenticate the users whenever during a user session. We interpret keystroke data as four-dimensional timing vectors, and clusters of keystroke vectors can thus form the static basis for analysis. Associated with the key transition is also a more dynamic probability distribution that can be used to form a Markov chain. For a special keystroke, there is ideally one cluster of vectors that resembles it; but since the actual observed cluster are usually formed from various different keys, only a set of probabilities to what a given key really is assured. This can be transcribed as a hidden state being observed in a hidden Markov model (HMM). In our previous research, we implemented a user-authentication process with a HMM that learned the special typing patterns from individuals, and then identified them. During the training stage, the timing information of each keystroke within a word was first gathered from each user as repeated words were typed. Then the key-transition and observation probability matrices for each user were built from the gathered typing data. Afterward, the model was tested with typing data from words that each user had entered separately to match their profiles. We chose to examine the typing patterns of users' login names since they would be the most frequently typed individual words. We found the HMM approach to be very suitable for our process due to the stochastic nature of typing patterns, and because it reveals pattern distributions and predicts possible keystroke sequences. However, an issue regarding the accuracy of observation being translated by the probability matrices was noted in the first-order non-ergodic HMM that was applied. Extended from our previous work, we propose a method to improve the process with a histogram of the similarity measured between the actual observation and the cluster centroid that it resembles. Such a histogram can be easily applied to higher-order models as the key factor when setting matching thresholds. The experimental results are thus improved due to the adaptive histogram in the process.

Full Text