Abstract

Biodistance analysis can elucidate various aspects of past population structure. The most commonly adopted measure of divergence when estimating biodistances is the mean measure of divergence (MMD). The MMD is an unbiased estimator of population divergence but this property is lost when the dataset includes variables with very high or low frequency. In the present paper, we examine new measures of divergence based on untransformed binary data and the logit and probit transformations. It is shown that a measure of divergence based on untransformed data is a better unbiased estimator of population divergence. The conventional MMD is a satisfactory distance measure for binary data; however, it may produce biased estimations of population divergence when there are many traits with frequencies lower than 0.1 or/and greater than 0.9. Finally, the measures of divergence based on the probit and logit transformations are usually biased estimators.

Highlights

  • Biodistance analysis examines the relatedness or distance of past populations, employing skeletal and dental phenotypic data

  • The mean measure of divergence (MMD) has two limitations: (a) it is based on the arcsine transformation, so when there are many traits with low or high frequency, it ceases being an unbiased estimator of population divergence, (b) when traits are inter-correlated, the relationships used to determine the statistical significance of the distances may not be accurate

  • We examine three new measures of divergence based on untransformed data and the logit and probit transformations, to the MMD

Read more

Summary

Introduction

Biodistance analysis examines the relatedness or distance of past populations, employing skeletal and dental phenotypic data. These phenotypic data include metrics and nonmetric traits, and are used as a proxy for the genotype with the underlying assumption that phenotypic variability expresses phylogenetic variation (Relethford 2016). Any measure used to estimate biodistances should have two main properties: (a) it should be an unbiased estimator of population divergence because even though we are estimating paired biodistances between samples, what we are interested in is the distance among the populations from which these samples derive, (b) it should provide the means to evaluate if the biodistances estimated are statistically significant or not. The most common measure used, mainly because it is an unbiased estimator of population divergence, is the mean measure of divergence (MMD). Smith in order to be used by M.S. Grewal (1962) in his estimation of biological divergence across generations of laboratory mice in sublines of the C57BL strain

40 Page 2 of 14
40 Page 4 of 14
Materials and methods
40 Page 6 of 14
Results for data transformations
40 Page 8 of 14
Results from simulated data
40 Page 10 of 14
Results concerning p value estimation
40 Page 12 of 14
40 Page 14 of 14
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call