DeepFam: deep learning based alignment-free method for protein family modeling and prediction.

Seokjun Seo,Minsik Oh,Youngjune Park,Sun Kim

doi:10.1093/bioinformatics/bty275

Seokjun Seo, Minsik Oh + Show 2 more

Open Access

PDF Available

https://doi.org/10.1093/bioinformatics/bty275

Copy DOI

Export

Save

Cite

Journal: Bioinformatics	Publication Date: Jun 27, 2018
Citations: 105	License type: CC BY-NC 4.0

Affiliation: Seoul National University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

MotivationA large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed.ResultsIn this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences.Availability and implementationCodes are available at https://bhi-kimlab.github.io/DeepFam.

Full Text