Application of non parametric Bayesian methods in high dimensional data

Yunqing Xia

doi:10.3233/jcm-237104

Abstract

With the development of technology and the widespread collection of data, high-dimensional data analysis has become a research hotspot in many fields. Traditional parameter methods often face problems such as dimensional disasters in high-dimensional data analysis. Non parametric methods have broad application prospects in high-dimensional data because they do not rely on specific parameter distribution assumptions. The Bayesian rule is more suitable for dealing with noise and outliers in high-dimensional data because it takes uncertainty into account. Therefore, it is of great significance to combine non parametric methods with Bayesian methods for application research in high-dimensional data analysis. In this paper, the nonparametric Bayesian method was applied to the analysis of high-dimensional data, and the Dirichlet process Mixture model was used to cluster high-dimensional data. The regression analysis of high-dimensional data was carried out through the prediction model of nonparametric Bayesian regression. In this paper, the nonparametric Bayesian method based on Bayesian sparse linear model was used for feature selection of high-dimensional data. In order to determine the superiority of nonparametric Bayesian methods in high-dimensional data analysis, this paper conducted experiments on nonparametric Bayesian methods and traditional parametric methods in high-dimensional data analysis from five aspects of cluster analysis, classification analysis, regression analysis, feature selection and anomaly detection, and evaluated them through multiple indicators. This article explored the application of non parametric Bayesian methods in high-dimensional data analysis from these aspects through simulation experiments. The experimental results show that the clustering accuracy of the non parametric Bayesian clustering algorithm was 0.93, and the accuracy of the non parametric Bayesian classification algorithm was between 0.93 and 0.99; the coefficient of determination of nonparametric Bayesian regression algorithm was 0.98; the F1 values of non parametric Bayesian methods in anomaly detection ranged from 0.86 to 0.91, which was superior to traditional methods. Non parametric Bayesian methods have broad application prospects in high-dimensional data analysis, and can be applied in multiple fields such as clustering, classification, regression, etc.

Full Text