Abstract
A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator's organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction.We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study.
Highlights
Defining the role of proteins in pathways is among the key challenges of human genomics
We show that functional genomics can complement traditional sequence similarity measures to improve the transfer of gene annotations between organisms
Our results show that functional knowledge transfer can improve the coverage and accuracy of machine learning methods used for gene function prediction in a diverse set of organisms
Summary
Defining the role of proteins in pathways is among the key challenges of human genomics. Many successful approaches have been developed for prediction of protein function and pathway membership [1,2,3,4,5,6], they rely on prior knowledge in the organism of interest to make new predictions (i.e. at least some genes in the organism already annotated to the pathway) [7,8,9,10,11]. Gene expression and protein interaction profiles can be used by machine learning methods to associate novel genes to pathways based on previously known pathway members [15,16] The potential of such computational approaches to direct experiments has been demonstrated in studies investigating mitochondrial biogenesis [17] and seed pigmentation [18]. Our study describes a method to robustly increase the set of prior gene
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.