Abstract

Additive clustering models provide a conceptually simple and representation-ally powerful approach to extracting features from similarity data. Objects are represented according to the presence or absence of a set of weighted features, and their observed similarities are modeled using the recovered features that objects have in common. Unlike partitioning or hierarchical clustering approaches, most approaches to additive clustering place no constraints on the way features may be assigned to objects. This representational freedom demands, however, that the issue of additive clustering model complexity is addressed. It is important that additive clustering models are generated so as to balance the competing demands of goodness-of-fit and complexity for substantive interpretation. This paper uses previous analytic results to derive a stochastic complexity criterion measure for additive clustering models. This measure simultaneously takes into account the goodness-of-fit, the number of clusters, and the complexity associated with the patterns of cluster inclusion and overlap within the model. A new algorithm for fitting additive clustering models to similarity data is then developed, using the stochastic complexity measure to control the balance between goodness-of-fit and complexity. The ability of the algorithm to recover known features and weights is assessed using Monte Carlo techniques, and its application to empirical data is demonstrated using a previously examined data set that measures the similarities among kinship terms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.