Abstract

Advancement in sequence data generation technologies are churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. Sequence data from the well studied model organism Saccharomyces cerevisiae has been commonly used to test and validate in silico prediction methods. DNA replication is a critical step in the cellular process and the sequence location where this process originates in the genomic landscape is generally referred as origin of replication. In this paper we investigate the application bidirectional Long Short Term (LSTM) Networks to predict origin of replication sequences. Long Short Term Memory (LSTM) networks have recently been shown to yield state of the art performance in speech recognition, and music generation. These networks are capable of learning long term patterns via the use of multiplication gates. This paper utilizes Deep bidirectional LSTM for prediction of origin of replication sequences belonging to the organism Saccharomyces cerevisiae. Results demonstrate that LSTMs outperform the commonly used machine learning classifiers such as Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Network (ANN), and Hidden Markov Model (HMM). An important additional advantage of LSTMs is that they work directly on the sequences and obviate the need for hand coded features.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call