Abstract
The definition of European population genetic substructure and its application to understanding complex phenotypes is becoming increasingly important. In the current study using over 4,000 subjects genotyped for 300,000 single-nucleotide polymorphisms (SNPs), we provide further insight into relationships among European population groups and identify sets of SNP ancestry informative markers (AIMs) for application in genetic studies. In general, the graphical description of these principal components analyses (PCA) of diverse European subjects showed a strong correspondence to the geographical relationships of specific countries or regions of origin. Clearer separation of different ethnic and regional populations was observed when northern and southern European groups were considered separately and the PCA results were influenced by the inclusion or exclusion of different self-identified population groups including Ashkenazi Jewish, Sardinian, and Orcadian ethnic groups. SNP AIM sets were identified that could distinguish the regional and ethnic population groups. Moreover, the studies demonstrated that most allele frequency differences between different European groups could be controlled effectively in analyses using these AIM sets. The European substructure AIMs should be widely applicable to ongoing studies to confirm and delineate specific disease susceptibility candidate regions without the necessity of performing additional genome-wide SNP studies in additional subject sets.
Highlights
Over the last several years there has been substantial progress in using genotypes to ascertain population genetic substructure and in applying this information to association testing [1,2,3,4,5,6,7,8,9]
Complete data including the standard deviation is provided in Supplemental Table 2. bPopulation groups included Druze, Bedouin (BDN), Palestinian (PAL), Ashkenazi Jewish American (AJA), Greek (GRK), Italian (ITN), Adygei (ADY), Spanish (SPN), Basque (BAS), IRISH, German (GERM), Eastern European (EEUR), Russian (RUS), Swedish (SWED), Orcadian (ORC), Sardinian (SARD), and Tuscan (TUSC)
Paired Fst values were determined between 18 population groups that were typed with genome-wide single-nucleotide polymorphisms (SNPs) arrays
Summary
Over the last several years there has been substantial progress in using genotypes to ascertain population genetic substructure and in applying this information to association testing [1,2,3,4,5,6,7,8,9] These studies have been advanced by the availability of efficient platforms for genotyping several hundred thousand SNPs, increased efforts in sampling various population groups, and application of both highly supervised (clustering algorithms) and largely unsupervised methods for analyzing high dimensional data (that is, genotypes in many individuals) [1,3,6]. Studies by multiple groups, including our own, have utilized principal components analyses (PCA) or multidimensional scaling to further define European population substructure using several hundred thousand SNPs that are present in genome-wide panels
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.