Abstract

<p>Single nucleotide polymorphisms (SNPs) are the most prevalent and stable class of genetic diversity that exist in most organisms. Functional SNPs are the most commonly used genetic markers for diversity study and molecular breeding in plants, and their quick recognition is in urgent demand. In this work, a computational approach to identify functional SNPs in rice genome based on machine learning is presented. To characterize and prioritize variants, two different categories of features, the nucleotide-sequence based features and the allele-specific based features, are extracted. In particular, the weighted Euclidean distance is employed to measure the changes of the transcription factors (TFs) binding affinities caused by SNPs. To deal with the classification problem on unbalanced data, the support vector machine (SVM) together with an oversampling method is employed. We use mRMR to find the optimal feature set, and the result shows that our method can achieve accuracy with sensitivity of ~74.2% and specificity of ~72.3% after 10-fold cross-validation. Furthermore, the sources of data to build the proposed prediction model are mainly sequence context of SNP and TF profiles in JASPAR database, which are all easy to be acquired. So, the prediction method can be easily applied to other plant species.</p> <p> </p>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call