In this computer note I introduce software, PopCluster, that implements a new likelihood method for unsupervised population structure analysis from marker data. To infer a coarse population structure, it assumes the mixture model and adopts a simulated annealing algorithm to make a maximum likelihood clustering analysis, partitioning the sampled individuals into a predefined number of clusters. To deduce a fine population structure, it further assumes the admixture model and employs an expectation maximisation algorithm to estimate individual admixture proportions. PopCluster has many features. First, it is one of just a couple of model-based methods that can handle both biallelic and multiallelic markers in the same framework. Second, it is the first population structure analysis method that uses both Message Passing Interface (MPI) and openMP to exploit multiple CPUs with both shared and distributed memories and has the capacity to handle genomic data with millions of individuals and millions of loci. Third, the algorithms for both mixture and admixture analyses are fast, rendering PopCluster favourably in computational efficiency over previous methods in analysing genomic data. Fourth, PopCluster is built for Windows, Linux and Mac platforms, and its Windows version has an integrated GUI that can conveniently visualise analysis results and facilitate data input. Fifth, its Windows version has a built-in simulation module designed to simulate genotype data under admixture, hybridization or migration models. PopCluster provides a valuable toolset for researchers to simulate, infer and visualise individual admixture and population genetic structure, hybridization and migration using marker data.
Read full abstract