Abstract
The effects of constraints-external conditions imposed on but not inherent in the dataare subtle, and they are as yet insufficiently understood. In line constraint arises when the character states of each Operational Taxonomical Unit are made to add up to some constant, as might be the case with percentage data. A simple transform is indicated. Cross constraint on a classification of OTUs would arise as a result of operations like row standardization. It shows up as a tendency for correlations to go negative, and such standardizations are best avoided if possible. Numerical taxonomic techniques seem to be engaged not in classifying things, but in analysing strings of numbers, and in a certain sense are numerical models of the things. To make inferences, about the things from analysis of the numbers, we must be sure then that our model is both sufficient, at least for the purposes we have set our classification, and also either undistorted, or, at least, distorted in a way we understand. Types of distortion introduced into the model by the process of measurement, or in the preparation of the data, but at any rate external to the thing, may be called constraints. Two types are examined in this paper, and both are named according to, their relation to, the data strings: In-line constraint arises when the elements of each string are forced to add up to some constant value. Cross constraint obtains when the sum of corresponding elements from each string is required to be a constant. Ordinarily, in numerical taxonomy, each Operational Taxonomical Unit (OTU) is modelled by a string of numbers arranged in a column, each character by a row. If the characters have been standardized, row sums are constant, and this is cross constraint on the relation between OTUs. But if R-type analysis of the relations between rows was of interest, the data strings would be the rows, each of which would be in-line constrained after row standardization. A chemical analysis represents a case of constraint which is inherent in the data. The sum of constituents in each sample must be 100%, so that samples are in-line constrained, and the relation between constituents is cross constrained. Other types of distortion, for example the partial logical correlations of Sokal and Sneath (1963), may also exist in the model, and may for taxonomic work be more important. But in two cases that drew my attention to constraints, namely the problem of discriminating explosions from earthquakes biy multivariate analysis of the information on the seismogram, and the problem of classifying rock samples by their chemical constitution, the effects of constraint have proved to be both subtle and far reaching. They should always be considered in assessing the reliability of a classification, and because constraint enters at the earliest stage, the effects will be felt in every method of multivariate analysis. IN-LINE CONSTRAINT AND THE
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.