Big Data Driven Oriented Graph Theory Aided tagSNPs Selection for Genetic Precision Therapy

Tianshuo Cong,Yong Ren,Tong Bai,Yifei Mu,Jingjing Wang,Sanghai Guan

doi:10.1109/access.2018.2886926

Tianshuo Cong, Yong Ren + Show 4 more

Open Access

https://doi.org/10.1109/access.2018.2886926

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 1	License type: cc-by-nc-nd

Affiliation: Tsinghua University, University of Southampton

Abstract

Recently, the world-wide human genome-related projects have been vigorously launched and implemented. Gene-sequencing techniques play a critical role in disease diagnosis, prediction, and population stratification relying on efficiently mining genetic features in the gene pool. Exploring the association between the sites of the genetic mutation and the disease-based population classification becomes a hot topic, which beneficially supports disease diagnosis and treatment on the molecular level. However, there are numerous variable sites even on a single chromosome in the human gene pool, and hence, the traditional classifiers are not able to dig out all single nucleotide polymorphism (SNP) sites without clearly excavating the characteristic SNP sites, termed tagSNPs, in SNP clusters. By applying big data mining techniques, in this paper, we, first of all, propose a principal component analysis-based algorithm for reducing the gene data dimension in order to cluster SNP sites in the low-dimensional space. Moreover, an oriented graph theory-based tagSNPs selection algorithm is designed. Finally, relying on the real-world 1000 Genomes Project dataset, we can achieve fewer tagSNPs than the traditional methods by invoking the complete process of our designed SNP classifier.

Full Text