Abstract

Population structure can be revealed using Single Nucleotide Polymorphisms (SNPs) which are genetic variations found in the DNA sequences of individuals. Due to the large number of SNPs, visualization of SNP data is often achieved through dimensionality reduction. Although Principal Component Analysis (PCA) has been extensively used for SNP data visualization, some other dimensionality reduction methods have been shown to be more successful in revealing complex population structures. Nevertheless, these techniques often suffer from reduced ability to preserve the global structure in the SNP data, namely the relative genetic distance between subpopulations, or from high computational cost. In this work, a method which uses Multidimensional Scaling (MDS) of smoothed PCA-transformed data (MSSPD) is proposed. MSSPD successfully reveals population structures in 2D maps, while being more effective than other techniques in preserving the global structure. In terms of computational efficiency, MSSPD is comparable to the fastest SNP visualization methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call