Abstract
This paper compares machine learning techniques and pattern discovery algorithms for the prediction of human single nucleotide polymorphisms (SNPs). We selected six pattern discovery algorithms (YMF, Projection, Weeder, MotifSampler, AlignACE and ANN-Spec) and two machine learning techniques (Random Forests and K-Nearest Neighbours) and applied them to the DNA sequences flanking non- coding SNPs on human chromosome 21. We compared the pattern similarity amongst the methods and validated the predictions using known SNPs on chromosome 22. Parameterization of both machine learning and pattern discovery algorithms was critical to their performance. Memory usage was broadly constant amongst the pattern discovery algorithms, but the CPU running time varied significantly between deterministic and probabilistic pattern discovery methods, i.e., on average, probabilistic methods run19 times slower than deterministic methods. This is the first demonstration of SNP prediction, as well as the first comparison of machine learning and pattern discovery algorithms in SNP prediction studies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.