Analysis of the information content of biological sequence data, and its subsequent use to predict the function of the corresponding molecules, are perhaps the most important aspects of the comparatively new discipline of bioinformatics. Of the millions of genes now being sequenced, the vast majority will only ever have their function assigned in silico; by contrast, a select few will have their activities assigned by experimentation. The accuracy of predictions rests largely on how similar a new gene (or a portion of it) is to a known sequence whose function has been described experimentally. When such similarities are below a certain threshold, different techniques can lead to the prediction of different functions, highlighting the need for experimental validation. Iyer et al. [1xQuod erat demostrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences. Iyer, L.M. et al. Genome Biol. 2001; 2: 1–11CrossrefSee all References[1] have now re-examined several examples in which prediction-based experimentation was used to assign functions to uncharacterized genes. The authors argue that using alternative prediction methods has suggested functions that are different from those demonstrated experimentally, calling into question the original computational approaches.Iyer et al. describe six examples in which computational methods had suggested functions for genes to which no biological role had previously been assigned. In all these cases, experimentation had subsequently supported the predictions and, by inference, the respective computational methods. Iyer et al. then applied one or more different computational methods to predict function. In all cases, these gave very different results of higher statistical significance compared with the original predictions, thus contradicting the experimental findings. For example, the gene encoding dihydropterate synthase (DHPS), an enzyme of folate metabolism, was not initially found in archaeal genomes. Nevertheless, two predicted proteins (MJ0301 and MJ0107), showing fairly low sequence similarity to bacterial DHPS, were found to be encoded in the genome of Methanococcus jannaschii using the program ORF. Testing these proteins for DHPS activity in vitro only showed the expected biological function for MJ0301. By contrast, the sequence- and fold-recognition methods applied by Iyer et al. predict that MJ0107 is a DHPS and that MJ0301 is instead a member of the β-lactamase-like superfamily of metal-dependent hydrolases.The six cases chosen by Iyer et al. represent exceptions to the rule; in most cases, computational and experimental methods complement each other during assignment of function to unknown genes. However, it is the genes to which no function can be readily assigned where disputes are most likely to arise, raising the question of whether biochemistry or bioinformatics provides the most reliable answer in such cases. As the authors imply, both approaches might be correct and the proteins in question could all be examples of exaptation, where a molecule known to have a certain activity is found to perform a completely unrelated function. Such unexpected functions are not without precedent; for example, metabolic enzymes (e.g. aldehyde dehydrogenase) are used as lens crystallins. Based on their analyses, Iyer et al. also make several functional predictions, some of which are testable experiments that would undoubtedly help resolve the disputed functional assignments and address the issue of exaptation. Beyond these particular issues, the work of Iyer et al. makes it clear that accurate annotation of genome sequences cannot be achieved by either bioinformatics or biochemistry alone.
Read full abstract