Abstract

The hidden Markov model (HMM) has long been one of the most commonly used probability graph models for modeling sequential or time series data. It has been widely used in many fields ranging from speech recognition, face recognition, anomaly detection, to gene function prediction. In this paper, we theoretically propose a variant of the continuous HMM for modeling positive sequential data which are naturally generated in many real-life applications. In contrast with conventional HMMs which often use Gaussian distributions or Gaussian mixture models as the emission probability density, we adopt the inverted Dirichlet mixture model as the emission density to build the HMM. The consideration of inverted Dirichlet mixture model in our case is motivated by its superior modeling capability over Gaussian mixture models for modeling positive data according to several recent studies. In addition, we develop a convergence-guaranteed approach to learning the proposed inverted Dirichlet-based HMM through variational Bayes inference. The effectiveness of the proposed HMM is validated through both synthetic data sets and a real-world application regarding anomaly network intrusion detection. Based on the experimental results, the proposed inverted Dirichlet-based HMM is able to achieve the detection accuracy rates that are about 4%~9% higher than those ones obtained by the compared approaches.

Highlights

  • The hidden Markov model (HMM) [1], [2] is one of the most commonly used probability graphical models for modeling sequential or time series data, such as video, audio, text, etc

  • We develop an unsupervised intrusion detection approach to detecting network-based attacks based on the proposed iDHMM

  • Based on the results shown in this table, the proposed iDHMM is able to outperform both DHMM and GHMM for the two data sets, which demonstrated the advantages of using the inverted Dirichlet (ID)-based HMM for intrusion detection

Read more

Summary

Introduction

The hidden Markov model (HMM) [1], [2] is one of the most commonly used probability graphical models for modeling sequential or time series data, such as video, audio, text, etc. It has been widely used in many fields ranging from handwritten word recognition, speech recognition, speech synthesis, face recognition, anomaly detection, to gene function prediction [3]–[8].

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call