Abstract
BackgroundInference of protein’s membership in metabolic pathways has become an important task in functional annotation of protein. The membership information can provide valuable context to the basic functional annotation and also aid reconstruction of incomplete pathways. Previous works have shown success of inference by using various similarity measures of gene ontology.ResultsIn this work, we set out to explore integrating ontology and sequential information to further improve the accuracy. Specifically, we developed a neural network model with an architecture tailored to facilitate the integration of features from different sources. Furthermore, we built models that are able to perform predictions from pathway-centric or protein-centric perspectives. We tested the classifiers using 5-fold cross validation for all metabolic pathways reported in KEGG database.ConclusionsThe testing results demonstrate that by integrating ontology and sequential information with a tailored architecture our deep neural network method outperforms the existing methods significantly in the pathway-centric mode, and in the protein-centric mode, our method either outperforms or performs comparably with a suite of existing GO term based semantic similarity methods.
Highlights
Inference of protein’s membership in metabolic pathways has become an important task in functional annotation of protein
Various similarity measures have been developed to quantify the semantic similarity of Gene Ontology (GO) terms and applied it in quantitative comparison of functional similarity of gene products, most of these methods are not developed for metabolic pathway membership inference [5,6,7,8,9,10]
We developed a method to include the graph structure information of gene ontology and the information contain in ontology terms as feature representation of proteins
Summary
We set out to explore integrating ontology and sequential information to further improve the accuracy. We developed a neural network model with an architecture tailored to facilitate the integration of features from different sources. We built models that are able to perform predictions from pathway-centric or protein-centric perspectives. We tested the classifiers using 5-fold cross validation for all metabolic pathways reported in KEGG database
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have