Abstract
When a set of patterns is collected for pattern recognition, the number of major clusters may not be known, and the set contains the outliers. In this paper, a method is proposed that can estimate appropriately the number of major clusters. When a model that describes the distribution of patterns is defined, the maximum-likelihood estimation can be applied to the parameter estimation, and the number of parameters can be optimized by the Akaike information criterion (AIC) or the minimum description length (MDL). Then the number of clusters can be estimated. When the set of patterns contains the outliers, however, they affect the parameter estimation, and, accordingly, the estimation of the number of clusters. This paper also proposes two robust clustering methods (MARC1 and MARC2) based on the maximum-likelihood method for the multivariate mixture normal distribution model, aiming at the reduction of the effect of the outliers. The number of clusters is estimated by AIC and MDL using the parameters obtained as a result of the clustering. The experimental results show that even if 45 percent of the patterns in each cluster are replaced by the outliers, their effects on the parameter estimation can be reduced and the adequate number of clusters can be estimated. The limit of the application of the proposed method is investigated. Then the result of application to the region segmentation is presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.