Abstract

Single Nucleotide Polymorphisms (SNPs) present an important component of a genome’s information and have been extensively used in genetics for population structure analysis. SNP data visualization assists in detecting population substructures. However, SNP sequences include thousands or millions of data points. One way to visualize SNP data is through dimensionality reduction. Principal Component Analysis (PCA) has been traditionally used for reducing dimensionality to 2D or 3D with reasonably acceptable outcomes. However, visualizing complex population structures requires more advanced techniques. Recently, t-Distributed Stochastic Neighbor Embedding (t-SNE) has been used for SNP visualization. In this work, a Multidimensional Scaling (MDS)-based method is presented and compared with t-SNE. Although both techniques successfully reveal population substructures in 2D, the MDS-based method better preserves the relative similarity between populations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call