Abstract

BackgroundMany differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component Analysis) plot. However, as an unsupervised method, PCA does not measure the differences in each single nucleotide polymorphism (SNP) among populations.ResultsWe applied an advanced mutual information-based feature selection method to detect associations between SNP status and ethnic groups using the latest HapMap Phase 3 release version 3, which included more sub-populations. A total of 299 SNPs were identified, and they can accurately predicted the ethnicity of all HapMap populations. The 10-fold cross validation accuracy of the SMO (sequential minimal optimization) model on training dataset was 0.901, and the accuracy on independent test dataset was 0.895.ConclusionsIn-depth functional analysis of these SNPs and their nearby genes revealed the genetic bases of skin and eye color differences among populations.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2328-0) contains supplementary material, which is available to authorized users.

Highlights

  • Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs

  • Among different populations, specific single nucleotide polymorphism (SNP) account for 15 % of all SNPs, and common SNPs account for 85 % of all SNPs; both types contribute to various characteristics, including drug resistance and skin color [12, 13]

  • To reduce the SNPs and remove the irrelevant SNPs that did not differ among ethnic groups, we calculated the Cramer’s V coefficient that measured the univariate association between SNP status, i.e., the number of minor alleles, and ethnic group categories in the training dataset

Read more

Summary

Introduction

Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. A single nucleotide polymorphism (SNP) is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 %) of a large population. Along with the rapid development of next-generation DNA sequencing technologies, Among humans, 99.9 % of the bases in the entire genome are remarkably similar; it is the remaining 0.1 % of the bases that makes a person unique [9]. Among this 0.1 % of bases, more than 90 % are SNPs [10].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call