Long sequence feature extraction based on deep learning neural network for protein secondary structure prediction

Yehong Chen

doi:10.1109/itoec.2017.8122472

Abstract

In this paper, a long sequence feature extraction method (LSFE) is proposed for protein secondary structure prediction. The proposed method is based on deep learning architecture which is mainly composed of three-layers: sparse auto-encoder, convolution feature extraction layer, and the softmax classifier. PSSM (position-specific scoring matrix) is used as the raw sequence representation. Two groups of self-taught feature filters are learned from 5-polypeptides and 13-polypeptides by the sparse auto-encoder layer. Finally, the new representations of 35-polypeptides got by the convolution layer are fed into the softmax classifier, as the top shallow classifier, for fast prediction. The experimental results indicate that overall accuracy (Q3) of around 74% on 25PDB is got within very short waiting time. Hence this deep learning architecture breaks up the top bound of window size in the art-of-state SVM+PSSM classifier, and showing the potential power in future work on bigger dataset.

Full Text