Abstract

Predicting fitness in natural populations is a major challenge in biology. It may be possible to leverage fast-accumulating genomic data sets to infer the fitness effects of mutant alleles, allowing evolutionary questions to be addressed in any organism. In this paper, we investigate the utility of one such tool, called PROVEAN. This program compares a query sequence with existing data to provide an alignment-based score for any protein variant, with scores categorized as neutral or deleterious based on a pre-set threshold. PROVEAN has been used widely in evolutionary studies, for example, to estimate mutation load in natural populations, but has not been formally tested as a predictor of aggregate mutational effects on fitness. Using three large published data sets on the genome sequences of laboratory mutation accumulation lines, we assessed how well PROVEAN predicted the actual fitness patterns observed, relative to other metrics. In most cases, we find that a simple count of the total number of mutant proteins is a better predictor of fitness than the number of proteins with variants scored as deleterious by PROVEAN. We also find that the sum of all mutant protein scores explains variation in fitness better than the number of mutant proteins in one of the data sets. We discuss the implications of these results for studies of populations in the wild.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call