Personal phenotypes to go with personal genomes

Michael Snyder,Mark Gerstein,Sherman Weissman

doi:10.1038/msb.2009.32

Abstract

Mol Syst Biol. 5: 273 With the cost of DNA sequencing decreasing rapidly, it is likely that the genome sequences of many individuals will be determined. In fact, if half of the individuals in industrialized countries choose to have their genomes sequenced, then well over 500 million personal genome sequences will be determined. Currently, such genetic information is likely to be of limited value to the individual, as the number of loci that provide useful predictive information is quite small (probably less than 200). Indeed, recent analyses of common complex traits such as diabetes, body mass and height show that in each case the genetically identifiable contribution from multiple candidate loci (18 in the case of diabetes) is only a small percentage (less than 7%) of the total identifiable genetic load (Gaulton et al , 2008; Willer et al , 2009); thus, the interpretable genetic contributions that can be identified are quite minor. Presumably, either many low‐frequency alleles at different loci contribute to the genetic load or perhaps the many phenotypes are because of other phenomena such as synergistic effects between variants at more than one locus or between different loci and factors in the environment, recurrent spontaneous mutations, or epigenetic defects. Regardless of which proves to be correct (likely a differing mixture of effects for different diseases), the ability to accurately correlate all bases with precise phenotypes is likely to be powerful only if a common set of phenotypes are scored. The power of 500 million sequences correlated with 500 million phenotypes can show both small contributions as well as help identify potential causative mutations. Indeed, a data set of this size would greatly exceed that of even the large genome‐wide association studies that typically analyze thousands of individuals to tens of thousands …

Full Text