A PROBABILTY NEURAL NETWORK FOR CONTINUOUS AND CATEGORICAL DATA

Shuang Cang,Hongnian Yu

doi:10.3182/20050703-6-cz-1902.01112

Abstract

In most application of the data classifications, the data sets contain both continuous and categorical variables. In other word, multivariate data sets containing mixtures of continuous and categorical variables arise frequently in practice. This paper presents a novel Probability Neural Network (PNN) which can classify the data for both continuous and categorical input data types. The case with either continuous or categorical input variables is a special case of the mixtures of continuous and categorical input variables. Therefore, the proposed PNN can be also applied to these two special cases. Expectation Maximisation (EM) algorithm is widely used for mixture models of continuous variables, but not applicable for categorical variables. A mixture model of continuous and categorical variables is used to construct a Probability Density Function (PDF) which is the key part for the PNN.The proposed PNN has two advantages comparing with the conventional algorithms such as the Multilayer Perceptron (MLP) Neural Network. One advantage is that the PNN can produce better results comparing with the MLP Neural Network, even using the normalized input variables for the MLP. Normally, the normalized input variables generate a better result than the non-normalized input variables for the MLP Neural Network. Another advantage is that the PNN does not need the cross validation data set and does not produce the over training like the MLP neural network does. These have been proven in our experimental study. The proposed PNN can also be used to perform the unsupervised cluster analysis. The superiority of PNN in comparing the MLP neural network is demonstrated by applying them to a real-life data set, the Trauma data set which includes both continuous and categorical variables.

Full Text