Abstract

A criticism of using the Gaussian density function to model the classes in multispectral remote sensing data is that the classes may be multimodal, and therefore, not well represented by the Gaussian density function. The normal mixture density function, which is the sum of one or more weighted Gaussian components, is a compromise between Gaussian and non-parametric density functions. It can model multimodal density functions, yet requires fewer parameters to be estimated than non-parametric density functions, which is an advantage in remote sensing applications where training samples are difficult and expensive to obtain. Usually in practice, the number of components is not known, and must be estimated from the training samples. A new approach for estimating the appropriate number of components in a normal mixture density is described. The approach is to divide the data from each class into various numbers of clusters using the nearest means algorithm, compute a measure of fit which measures how well the training data are represented by the clusters, and to select the number of clusters that best fits the data. The measure of fit is computed by leaving one sample out and estimating the mean vectors and covariance matrices of the mixture density with the rest of the samples. Then the likelihood of the left-out sample is computed, and the process is repeated, each time leaving out a different sample. In this way, the samples used to test the estimates are independent of the samples used to compute estimates, and so the measure of fit, which is the average log likelihood of the left-out samples, does not increase monotonically like the joint likelihood, but will reach a maximum. The number of clusters that results in the maximum value of the measure of fit is selected as the estimate of the number of components in the mixture density.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call