Abstract
BackgroundSingle Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence. As SNPs are highly conserved throughout evolution and within a population, the map of SNPs serves as an excellent genotypic marker. Conventional SNPs analysis mechanisms suffer from large run times, inefficient memory usage, and frequent overestimation. In this paper, we propose efficient, scalable, and reliable algorithms to select a small subset of SNPs from a large set of SNPs which can together be employed to perform phenotypic classification.MethodsOur algorithms exploit the techniques of gene selection and random projections to identify a meaningful subset of SNPs. To the best of our knowledge, these techniques have not been employed before in the context of genotype‐phenotype correlations. Random projections are used to project the input data into a lower dimensional space (closely preserving distances). Gene selection is then applied on the projected data to identify a subset of the most relevant SNPs.ResultsWe have compared the performance of our algorithms with one of the currently known best algorithms called Multifactor Dimensionality Reduction (MDR), and Principal Component Analysis (PCA) technique. Experimental results demonstrate that our algorithms are superior in terms of accuracy as well as run time.ConclusionsIn our proposed techniques, random projection is used to map data from a high dimensional space to a lower dimensional space, and thus overcomes the curse of dimensionality problem. From this space of reduced dimension, we select the best subset of attributes. It is a unique mechanism in the domain of SNPs analysis, and to the best of our knowledge it is not employed before. As revealed by our experimental results, our proposed techniques offer the potential of high accuracies while keeping the run times low.
Highlights
Single Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence
After that we identify the best 32 single-nucleotide polymorphism (SNP) using the feature selection algorithm and validate these SNPs with the top SNPs found in the previous step based on p-values
P-value calculation is based on logistic regression based test, and each p-value is calculated on a single SNP which is equivalent to a Chi-square test
Summary
Single Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence. A single-nucleotide polymorphism (SNP) is defined as a DNA sequence variation where a single nucleotide, i.e., A, T, C, or G in the genomic sequence differs among the individuals of a biological species. It is the most common type of genetic variation among people. Candidate gene studies have their own inherent limitations (reviewed in [7]), the use of smaller focused arrays possibly represents a more practical approach for many studies than the use of large scale arrays such as genome wide association studies (GWAS). According to [8], the panel SNPs that we use in our study are able to extract full haplotype information for candidate genes in alcoholism, other addictions and disorders of mood and anxiety
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.