"THE Classification of Constrained Data"

R Underwood

doi:10.1093/sysbio/19.1.98a

Abstract

The effects of constraints-external conditions imposed on but not inherent in the dataare subtle, and they are as yet insufficiently understood. In line constraint arises when the character states of each Operational Taxonomical Unit are made to add up to some constant, as might be the case with percentage data. A simple transform is indicated. Cross constraint on a classification of OTUs would arise as a result of operations like row standardization. It shows up as a tendency for correlations to go negative, and such standardizations are best avoided if possible. Numerical taxonomic techniques seem to be engaged not in classifying things, but in analysing strings of numbers, and in a certain sense are numerical models of the things. To make inferences, about the things from analysis of the numbers, we must be sure then that our model is both sufficient, at least for the purposes we have set our classification, and also either undistorted, or, at least, distorted in a way we understand. Types of distortion introduced into the model by the process of measurement, or in the preparation of the data, but at any rate external to the thing, may be called constraints. Two types are examined in this paper, and both are named according to, their relation to, the data strings: In-line constraint arises when the elements of each string are forced to add up to some constant value. Cross constraint obtains when the sum of corresponding elements from each string is required to be a constant. Ordinarily, in numerical taxonomy, each Operational Taxonomical Unit (OTU) is modelled by a string of numbers arranged in a column, each character by a row. If the characters have been standardized, row sums are constant, and this is cross constraint on the relation between OTUs. But if R-type analysis of the relations between rows was of interest, the data strings would be the rows, each of which would be in-line constrained after row standardization. A chemical analysis represents a case of constraint which is inherent in the data. The sum of constituents in each sample must be 100%, so that samples are in-line constrained, and the relation between constituents is cross constrained. Other types of distortion, for example the partial logical correlations of Sokal and Sneath (1963), may also exist in the model, and may for taxonomic work be more important. But in two cases that drew my attention to constraints, namely the problem of discriminating explosions from earthquakes biy multivariate analysis of the information on the seismogram, and the problem of classifying rock samples by their chemical constitution, the effects of constraint have proved to be both subtle and far reaching. They should always be considered in assessing the reliability of a classification, and because constraint enters at the earliest stage, the effects will be felt in every method of multivariate analysis. IN-LINE CONSTRAINT AND THE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

"THE Classification of Constrained Data"

Abstract

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Journal: Systematic Biology	Publication Date: Mar 1, 1970
Citations: 3

Similar Papers

A Direct Nondimensional Clustering Method for Binary Data
Mauro W Buser ... Cesare Baroni-Urbani
Biometrics | VOL. 38
Mauro W Buser, et. al.Mauro W Buser ... Cesare Baroni-Urbani
01 Jun 1982
Biometrics | VOL. 38

Analyzing Character Variation in Geographic Space
Robert R. Sokal
-
Robert R. SokalRobert R. Sokal
01 Jan 1982
01 Jan 1982

CHAPTER 21 - MULTIVARIATE DATA-ANALYSIS METHODS IN BIBLIOMETRIC STUDIES OF SCIENCE AND TECHNOLOGY
R.J.W Tijssen ... J De Leeuw
Handbook of Quantitative Studies of Science and Technology | VOL. -
R.J.W Tijssen, et. al.R.J.W Tijssen ... J De Leeuw
01 Jan 1987
Handbook of Quantitative Studies of Science and Technology | VOL. -

Density, activity, and diversity of bacteria indigenous to a karstic aquifer
K.J Rusterholtz ... L.M Mallory
Microbial Ecology | VOL. 28
K.J Rusterholtz, et. al.K.J Rusterholtz ... L.M Mallory
01 Jul 1994
Microbial Ecology | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

"THE Classification of Constrained Data"

Abstract

Talk to us

Similar Papers

More From: Systematic Biology