Abstract

Data occurring in the form of frequencies are common in genetics—for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis (“raw” principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call