Abstract

A common and old problem in statistics is the separation of a heterogeneous population into more homogeneous subpopulations. A wide variety of approaches and techniques for tackling this problem now exist. The finite mixture model is one such technique used to analyse a given data set and create natural groups. The main advantage of mixture models is that they offer a very flexible and relatively easy method of fitting to a data set.The main focus of this thesis will be the clustering of data with the aid of mixed variable mixture models. There has been a lot of research and many papers written with regard to the analysis of continuous variables. This thesis, however, will be concerned with the techniques that will be able to analyse data from continuous to mixed variables, a combination of continuous and discrete variables.The expectation-maximization (EM) algorithm of Dempster et al. (1977) will be used in this thesis for the clustering of mixed variable data. As demonstrated effectively in numerous publications, this algorithm solves estimation of the model parameters iteratively under an advantageous property of monotonic convergence. An extension of the EM algorithm, the ECM algorithm introduced by Meng and Rubin (1993), will also be considered in this thesis for itsrole within more complicated mixed mixture models.On the application of mixed mixture models, this thesis will use a joint distribution specified by the conditional distribution of the continuous variables, given the values of the discrete variables times the marginal distribution of the latter. There are three models - naive, logistic and multinomial - considered for the discrete variables and two models - independent and location - considered for the continuous variables. These models are combined to make a total of sixmodels for the analysis of mixed variable data.The program EMM, its design and development will be discussed in this thesis. Many real mixed data sets and some simulations will also be analysed to investigate how these proposed methods will compare with existing methods. Some of these data sets will be transformed so as to create a suitable input for the program and hence provide a comparable output so that comparisons can be made.The program EMM is written in the FORTRAN computer language. This means that it is well organised with each segment arranged in separate sub-routines. This allows for easy modification to any part of the program or alternatively the addition of a new subroutine so as to increase the capabilities of the model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.