Abstract
Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.
Highlights
Machine learning and data mining techniques have been applied to a wealth of genome-scale data to produce meaningful predictions of gene/protein involvement in biological processes and pathways [1,2,3,4,5,6,7,8,9]
We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods
We identified a total of 135 genes with existing literature evidence that were ‘‘under-annotated.’’ We have presented this list to Saccharomyces Genome Database (SGD) and they are evaluating these observations using their established curatorial procedures; as of nearly half of these genes have been added to the annotations
Summary
Machine learning and data mining techniques have been applied to a wealth of genome-scale data to produce meaningful predictions of gene/protein involvement in biological processes and pathways [1,2,3,4,5,6,7,8,9]. Data continue to be generated at a rate that outpaces the characterization of gene functions [12] This disparity between the computational and experimental aspects of gene function discovery may be due to a lack of clear demonstrations of the effectiveness of computation in directing laboratory efforts. No large-scale studies have been performed to fully explore the ability of computational methods to accurately assign functions to sizeable sets of uncharacterized proteins. Without such comprehensive evaluations, it remains unclear how computational methods can best be employed to guide experimental efforts in discovering novel biology
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.