Abstract

In general, most accurate gene/protein annotations are provided by curators. Despite having lesser evidence strengths, it is inevitable to use computational methods for fast and a priori discovery of protein function annotations. This paper considers the problem of assigning Gene Ontology (GO) annotations to partially annotated or newly discovered proteins. We present a data mining technique that computes the probabilistic relationships between GO annotations of proteins on protein-protein interaction data, and assigns highly correlated GO terms of annotated proteins to non-annotated proteins in the target set. In comparison with other techniques, probabilistic suffix tree and correlation mining techniques produce the highest prediction accuracy of 81% precision with the recall at 45%. Code is available upon request. Results and used materials are available online at http://kirac.case.edu/PROTAN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call