Abstract
Protein Secondary structure prediction is an emerging topic in bioinformatics to understand briefly the functions of protein and their role in drug invention, medicine and biology. In our research we have applied two recurrent neural network based approach Bi-LSTM (Bidirectional Long Short-Term Memory) and LSTM (Long Short-Term Memory). Our research was focused on primary structure up to 134 in length of amino acids. Initially our proposed model produced a ‘Indexed Lexicon of corpus’ using tri-gram conversion for primary structure strings. Each primary structure tri-gram transformed snippets is substituted with its associated index mentioned in ‘Indexed corpus’. The indexed parameter vector inputted into our proposed Bi-LSTM and LSTM model. We got best accuracy when we have used two Bi-LSTM and three LSTM layers respectively in Bi-LSTM and LSTM models. To prevent biasness and minimize overfitting problem we have utilized two dropout layers for each of Bi-LSTM and LSTM model. We have operated our model on ccPDB 2.0 benchmark dataset. There is total eight states protein secondary structure in this dataset. For this sst8 secondary structure we have achieved 83.24% accuracy for our proposed LSTM model and 89.10% accuracy for our Bi-LSTM model. We have configured our model to run for 50 epochs with batch size 64. For compilation of our models we have utilized ‘adam’ optimizer and the ‘categorical crossentropy’ loss function. To make dataset balanced to our model we have also employed 5-fold cross validation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have