Abstract

This study explores various options available for choosing the number of principal coordinates m in the canonical analysis of principal coordinates ‘CAP’, a useful procedure that has wide-ranging application wherever multivariate data sets are collected or generated. Choosing too few coordinates (small m) in this constrained (i.e. hypothesis-based) ordination procedure may lead to inadequate separation of the groups (when used as a canonical discriminant analysis) or to inadequate correlation between explanatory and response variables (when used as a canonical correlations analysis), whereas choosing too many (large m) may lead to overparameterization, resulting in overfitting of the data and spurious relationships. It is shown here that the optimum number of principal coordinates is simply the one that results in the smallest P value in the canonical analysis carried out using permutations. For data in which more than one m value results in the same minimum P value, m should be chosen from that set to be the number of principal coordinates that minimizes the leave-one-out residual sum of squares. This choice of m provides suitable solutions for each of the 17 case studies investigated here (which yielded 17 canonical discriminant analyses and 7 canonical correlation analyses).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call