Abstract
AbstractIn recent years, there has been a growing interest in clustering uncertain objects. In contrast to traditional, ‘sharp’ data representation models, uncertain objects are modeled as probability distributions defined over uncertainty regions. In this context, a major issue is related to the poor efficiency of existing algorithms, which is mainly due to expensive computation of the distance between uncertain objects. In this work, we extend our earlier work in which a novel formulation to the problem of clustering uncertain objects is defined based on the minimization of the variance of the mixture models that represent the clusters being discovered. Analytical properties about the computation of variance for cluster mixture models are derived and exploited by a partitional clustering algorithm, called MMVar. This algorithm achieves high efficiency since it does not need to employ any distance measure between uncertain objects. Experiments have shown that MMVar is scalable and outperforms state‐of‐the‐art algorithms in terms of efficiency, while achieving better average performance in terms of accuracy. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 116–135, 2013
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistical Analysis and Data Mining: The ASA Data Science Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.