Abstract

Protein Subcellular Localization (PSL) prediction of recently evolved Unknown Protein Sequence (UPS) is vital for understanding the protein functions. Although PSL provides insight into the prediction of harmful and useful characteristics, diagnosis of disease and drug design. In the present work One-Hot-Encoding (OHE) and Convolutional Neural Network (CNN) based OCNN model is proposed for the functional characterization of protein sequence through the PSL. Gram-Positive (G+) dataset with 473 known protein sequence samples including four subcellular localizations is used for the training and validation of the OCNN model. As essential preprocessing raw protein sequence has been encoded using OHE, as well as the length of the encoded sequence are standardized and normalized through padding and capping. Next, encoded and standardized protein sequence samples are convoluted in the hidden layer of the OCNN model using ReLU, TanH, and Sigmoid activation function. After that Adam and Stochastic Gradient Decent (SGD) optimization function are utilized for the PSL prediction of the protein sequence samples. OCNN model achieved 92.94% of accuracy through combination of Sigmoid, Softmax, and Adam functions with known protein sequences. The validated OCNN model can be further utilized for the function prediction of UPS, where 64.83% accuracy is achieved through the combination of ReLU, Softmax, and Adam functions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call