Abstract

BackgroundInference of protein’s membership in metabolic pathways has become an important task in functional annotation of protein. The membership information can provide valuable context to the basic functional annotation and also aid reconstruction of incomplete pathways. Previous works have shown success of inference by using various similarity measures of gene ontology.ResultsIn this work, we set out to explore integrating ontology and sequential information to further improve the accuracy. Specifically, we developed a neural network model with an architecture tailored to facilitate the integration of features from different sources. Furthermore, we built models that are able to perform predictions from pathway-centric or protein-centric perspectives. We tested the classifiers using 5-fold cross validation for all metabolic pathways reported in KEGG database.ConclusionsThe testing results demonstrate that by integrating ontology and sequential information with a tailored architecture our deep neural network method outperforms the existing methods significantly in the pathway-centric mode, and in the protein-centric mode, our method either outperforms or performs comparably with a suite of existing GO term based semantic similarity methods.

Highlights

  • Inference of protein’s membership in metabolic pathways has become an important task in functional annotation of protein

  • Various similarity measures have been developed to quantify the semantic similarity of Gene Ontology (GO) terms and applied it in quantitative comparison of functional similarity of gene products, most of these methods are not developed for metabolic pathway membership inference [5,6,7,8,9,10]

  • We developed a method to include the graph structure information of gene ontology and the information contain in ontology terms as feature representation of proteins

Read more

Summary

Results

We set out to explore integrating ontology and sequential information to further improve the accuracy. We developed a neural network model with an architecture tailored to facilitate the integration of features from different sources. We built models that are able to perform predictions from pathway-centric or protein-centric perspectives. We tested the classifiers using 5-fold cross validation for all metabolic pathways reported in KEGG database

Conclusions
Background
Results and discussion
Methods
Method
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call