Abstract

<p>Single Nucleotide Polymorphism (SNP) is the variant on a single nucleotide in the genome. Functional SNP, as one of the most important molecular markers in disease research, has been widely used in various research fields, such as tumor pathogenesis, disease diagnosis and treatment, prognostic evaluation, drug development, etc. The number of functional SNPs in noncoding genome regions is much more than that in coding regions, and their detection is more difficult. In this work, a multi-feature mining based computational method is proposed to predict the functional SNPs in human noncoding genomes. We first analyzed the sequence properties, evolutionary conservation properties and epigenetic modification signal properties of the sample SNPs. Statistical methods together with multiple annotation data from genomes and epigenetics were used to mine high-dimensional discriminative features subsequently. In particular, the allele-specific features were designed to distinguish the function of SNPs with close locations. The random forest method was used to conduct feature dimension reduction and classification. The 10-fold cross-validation result showed the Area Under the Receiver Operating Characteristic Curve (AUC) of our method improved by 16.9% and 43.4% over existing methods GWAVA and CADD, respectively, illustrating that the allele-specific based features can help to distinguish functional and netural SNPs with near locations.</p> <p> </p>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call