Abstract

When evaluating genetic resources, data on continuous and categorical attributes of accessions is collected. The Ward method and the modified location model (MLM) (Ward-MLM strategy) classify genetic resources using a mixture of categorical and continuous variables and combine all the categorical variables into one multinomial variable (W) that is assumed to be statistically independent from the continuous variables. The main objective of this study was to examine the robustness of the MLM for recovering underlying true subpopulations when independence between variable W and the continuous variables does not hold. In addition, several scenarios, based on different degrees of overlap of the continuous and discrete variables in simulated data sets, were generated. Results showed that when the subpopulations were well-differentiated, the Ward-MLM strategy effectively predicted the true number of subpopulations and fully recovered their structure, regardless of the level of dependence between the W variable and the vector of continuous variables. When the subpopulations showed unclear boundaries and a high degree of overlap in the W variable and in the continuous variables, the Ward-MLM strategy predicted a different number of subpopulations but fully recovered the composition of the subpopulations. In this case, new groups are formed that show a balanced and consistent structure in their composition as compared with the subpopulations. The MLM proved to be a robust model under medium and strong dependence between the variable W and the vector of continuous variables and under various kinds of overlapping between subpopulations with respect to the continuous and discrete variables.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call