Abstract

Clustering high-dimensional data under the curse of dimensionality is an arduous task in many applications domains. The wide dimension yields the complexity-related challenges and the limited number of records leads to the overfitting trap. We propose to tackle this problematic using the graphical and probabilistic power of the Bayesian network. Our contribution is a new loose hierarchical Bayesian network model that encloses latent variables. These hidden variables are introduced for ensuring a multi-view clustering of the records. We propose a new framework for learning our proposed Bayesian network model. It starts by extracting the cliques of highly dependent features and it proceeds to learn representative latent variable for each features’ clique. The experimental results of our comparative analysis prove the efficiency of our model in tackling the distance concentration challenge. They also show the effectiveness of our model learning framework in skipping the overfitting trap, on benchmark high-dimensional datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call