Abstract

THE current approach in numerical taxonomy is directed towards the so-called “minimum-variance” solution, for which it is argued that a population should be partitioned into cluster subsets by minimizing the total within group variation. Several classification methods have been compared1 and shown to possess related variance constraints, and a case has been made1–3 for suggesting that such methods are not ideally suited to the taxonomic problem of resolving “natural” classes. Implicit in the minimum variance approach is the concept that cluster should have no significant overall variance or spread, and this implies that in the case of a unimodal swarm the distribution should be split into an arbitrary number of compact sections. By contrast, Forgey has argued2,3 that for a “natural” classification, clusters should correspond to data modes, and there can only be as many classes as there are distinct modes. No variance constraint is implied, or should be induced, for when a mode is elongated rather than spherical the distribution merely reflects some internal factor of variation for the corresponding class. Such factors will be present to some extent, depending on data transformations and the quality of the selected character set, and therefore a subsequent variable search is necessary to discover the hidden constant characteristics of the class. Furthermore, those characters which are non-constant for a cluster mode may be inter-correlated, suggesting that the original character choice was poor, and in such cases the consideration of correlations, ratio variables and regression coefficients is indicated. Forgey interprets2,3 a data mode as a continuous dense swarm of points, separated from other such modes by either empty space or a scattering of “noise” data. It has been suggested that “noise” data usually result from sampling errors, and while this is true, they can also be interpreted as those natural phenomena associated with the intersecting tails of disjoint continuous distributions. We can therefore expect a “natural” cluster to exhibit a dense centre (of any shape) which is surrounded by a haze or cloud of points, and the problem is to isolate the dense centres irrespective of this interference.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.