Applications of Machine Learning Tools in Genomics: A Review

Joseph L Fracasso,Md Liakat Ali

doi:10.1007/978-3-030-34139-8_33

Abstract

To overcome the challenges presented by the manual analysis of large datasets inherent to DNA sequences, machine learning (ML) tools are commonly employed due to their relative accuracy and ease of implementation. However, no consensus exists regarding the most broadly applicable and effective machine learning tool for performing multiple analysis on DNA sequences, with many researchers instead opting to utilize proprietary hybrids. Review determined that the modal techniques among the literature surveyed were support vector machines and neural networks, both existing in modified proprietary forms to best fit DNA sequence analysis. Analyses were principally focused on site specific activities which were then used to create inferences regarding the whole of the molecule. These findings suggest neural networks and support vector machines as verified by Bayesian statistics may be the optimal approach for analyzing long DNA sequences.

Full Text