Abstract

In genome-wide association studies, the length of the single nucleotide polymorphisms (SNPs) has been drastically increased. The data may contain many near-duplicated SNPs in linkage equilibrium, which can cause difficulties in anaysis. It may also bring about many statistical problems in further analysis. Principal component analysis is a popular dimension reduction technique and is well known to be effective for many genetic association analyses. However, it is a linear combination of all the original variables, and does not provide direct interpretation about the original number of variables. The purpose of our study is to eliminate the redundant SNPs and select a smaller subset made of only the informative SNPs. We propose an unsupervised SNP selection algorithm based on the principal variable (PV) method. It achives the dimensionality reduction by selecting a subset of original variables called PVs that preserve as much information as possible. To find an optimal subset of SNPs, we focus on the criterion which minimizes the squared norm of the partial covariance matrix. We define principal component cluster by principal component analysis and choose the representative SNP with high loadings on important principal component on average. After discarding other SNPs in the PC cluster, we calculate the partial covariance matrix for the remaining variables given principal variable. To obtain the next representative SNP, the same procedure is iterated to the partial covariance matrix. The process repeats until there's no more variable to select or to meet some stopping criterion, the percentage variance in terms of trace or squared norm of the covariance matrix. The resulting subset of SNPs could be used for further analysis with multiple purposes such as gene-gene interactions. We illustrate the proposed method by real genotype data and compare its performance with five current selection methods for principal variables.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.