Abstract
Population structure is a commonplace feature of genetic variation data, and it has importance in numerous application areas, including evolutionary genetics, conservation genetics, and human genetics. Understanding the structure in a sample is necessary before more sophisticated analyses are undertaken. Here we provide a protocol for running principal component analysis (PCA) and admixture proportion inference-two of the most commonly used approaches in describing population structure. Along with hands-on examples with CEPH-Human Genome Diversity Panel and pragmatic caveats, readers will learn to analyze and visualize population structure on their own data.
Highlights
Population structure is a commonplace feature of genetic variation data, and it has importance in numerous application areas, including evolutionary genetics, conservation genetics, and human genetics
To install the package use: The following command in R generates plots showing each individual sample’s position in the principal component analysis (PCA) space and the median position of each labelled group in PCA space: First one may notice several populations are separated with PC1 and PC2, with the more isolated populations being those that were most distinguished from the others by ADMIXTURE (Fig. 6)
Each analysis software (ADMIXTURE and smartpca) and each visualization package contain numerous other options that may be suitable for specific analyses and we encourage the readers to spend time in the manuals of each
Summary
Population structure is a commonplace feature of genetic variation data, and it has importance in numerous application areas, including evolutionary genetics, conservation genetics, and human genetics. Population structure is the existence of differing levels of genetic relatedness among some subgroups within a sample. This may arise for a variety of reasons, but a common cause is that samples have been drawn from geographically isolated groups or different locales across a geographic continuum. Regardless of the cause, understanding the structure in a sample is necessary before more sophisticated analyses are undertaken. Two of the most commonly used approaches to describe population structure in a sample are principal component analysis [5, 16, 23, 25] and admixture proportion inference [19, 26]. Principal component analysis reduces a multi-dimensional dataset to a much smaller number of dimensions that allows for visual
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.