Abstract

As a step toward understanding the complex information from data and relationships, structural and discriminative knowledge reveals insight that may prove useful in data interpretation and exploration. This paper reports the development of an automated and intelligent procedure for generating the hierarchy of minimize entropy models and principal component visualization spaces for improved data explanation. The proposed hierarchical mimimax entropy modeling and probabilistic principal component projection are both statistically principles and visually effective at revealing all of the interesting aspects of the data set. The methods involve multiple use of standard finite normal mixture models and probabilistic principal component projections. The strategy is that the top-level model and projection should explain the entire data set, best revealing the presence of clusters and relationships, while lower-level models and projections should display internal structure within individual clusters, such as the presence of subclusters and attribute trends, which might not be apparent in the higher-level models and projections. With may complementary mixture models and visualization projections, each level will be relatively simple while the complete hierarchy maintains overall flexibility yet still conveys considerable structural information. In particular, a model identification procedure is developed to select the optimal number and kernel shapes of local clusters from a class of data, resulting in a standard finite normal mixtures with minimum conditional bias and variance, and a probabilistic principal component neural network is advanced to generate optimal projections, leading to a hierarchical visualization algorithm allowing the complete data set to be analyzed at the top level, with best separated subclusters of data points analyzed at deeper levels. Hierarchial probabilistic principal component visualization involves (1) evaluation of posterior probabilities for mixture data set, (2) estimation of multiple principal component axes from probabilistic data set, and (3) generation of a compete hierarchy of visual projections. With a soft clustering of the data set t<SUB>i</SUB> via the EM algorithm, data points will effectively belong to more than one cluster at any given level with posterior probabilities denoted by z<SUB>ik</SUB>. Thus, the effective input values are z<SUB>ik</SUB>t<SUB>i</SUB> for an independent visualization space k in the hierarchy. Further projections can again be performed using the effective input values z<SUB>ik</SUB>z<SUB>j\k</SUB>t<SUB>i</SUB> for the visualization subspace j. The complete visual explanation hierarchy is generated by performing principal projection and model identification in two iterative steps using information theoretic criteria, EM algorithm, and probabilistic principal component analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call