Abstract
Gene ontology (GO) is a standardized and controlled vocabulary of terms that describe the molecular functions, biological roles and cellular locations of proteins. GO terms and GO hierarchy are regularly updated as the accumulated biological knowledge. More than 50,000 terms are included in GO and each protein is annotated with several or dozens of these terms. Therefore, accurately predicting the association between proteins and massive GO terms is rather challenging. To accurately predict the association between massive GO terms and proteins, we proposed a method called Hashing GO for protein function prediction (HashGO in short). HashGO firstly adopts a protein-term association matrix to store available GO annotations of proteins. Then, it tailors a graph hashing method to explore the underlying structure between GO terms and to obtain a series of hash functions to compress the high-dimensional protein-term association matrix into a low-dimensional one. Next, HashGO computes the semantic similarity between proteins based on Hamming distance on that low-dimensional matrix. After that, it predicts missing annotations of a protein based on the annotations of its semantic neighbors. Experimental results on archived GO annotations of two model species (Yeast and Human) show that HashGO not only more accurately predicts functions than other related approaches, but also runs faster than them.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.