Update to Blangero et al.'s "Quantitative Trait Nucleotide Analysis Using Bayesian Model Selection" (2005):From QTL Localization to Functional Variant Identification John Blangero Keywords Quantitative Trait Loci, Linkage Studies, Bayesian Quantitative Trait Nucleotide (BQTN) Analysis, SOLAR Genetics Analysis Package "Quantitative Trait Nucleotide Analysis Using Bayesian Model Selection" and its companion, "The F7 Gene and Clotting Factor VII Levels: Dissection of a Human Quantitative Trait Locus," by Soria et al. (2005), were originally published back to back in 2005. As with many scientific papers, the genesis of this particular paper had a rather unique trajectory. Originally, these two Lasker Prize winning papers were components of a single paper that we submitted to the journal Science. From that august journal, we received three quite verbose reviews. The reviews were generally favorable, but the editors realized that the extent of methodological detail that was being asked for would lead to a paper that would be incompatible with their format. Therefore we began plans to separate the papers and search for another journal. At that time, Sarah Williams-Blangero, then the editor-in-chief of Human Biology and a close collaborator of mine, approached me to ask that we consider sending the papers to her journal. She believed that the papers would provide solid material for her stated goal of increasing the emphasis of the journal on the genetic dissection of normal human variation, a goal that I strongly supported then and still strongly support. Being a good citizen in the American Association of Anthropological Genetics, the dominant sponsoring organization of the journal, I agreed and convinced my colleagues to send both papers to Human Biology for consideration. The final published papers formed a cohesive pairing, although each stands on its own merits. I have long worked on the twin goals of localization and identification of human quantitative trait loci (QTLs) (Blangero 2004). Much of my career has focused on the initial localization of QTLs using linkage-based variance component methods such as those developed by Almasy and Blangero (1998). Currently, many localization studies use a genome-wide association paradigm, but the goals are still the same: to obtain an initial genomic localization of the most important QTLs for a given trait. In linkage-based approaches, the localization region typically involves 10-15 Mb of sequence, whereas in the newer association-based [End Page 849] approaches this localization interval is reduced to approximately 500 kb or so. However, localization, though essential, is the less interesting of the two goals. The actual identification of the genes underlying human phenotypic variation is clearly the ultimate and more biologically important goal of this type of human genetic research. Of course, gene identification and the identification of the underlying functional sequence variations require elaborate wet laboratory methods for ultimate proof. These laboratory assays are technically difficult, labor intensive, and expensive. We cannot routinely perform such assays on every sequence variant that exhibits a strong correlation with a phenotype of interest. As a statistical geneticist, I decided to come up with a robust statistical method that would objectively prioritize sequence variants in terms of their likelihood of being directly functional. The Bayesian quantitative trait nucleotide (BQTN) analysis is the result of this quest. The 2005 paper was based on the premise that gene-centric complete sequence variation would soon be feasible on a large scale. This has certainly turned out to be true. Given such complete sequence information, is it possible to statistically choose the most likely functional variants? The original BQTN model was an attempt to estimate posterior probabilities of effect (or functionality) on each sequence variant. It can be shown that under some weak regularity assumptions, a simple approach that uses both Bayesian averaging and Bayesian model selection can be accurate at choosing the most likely functional variants. Minimally, it can choose the set of variants that optimally predict the phenotype. The implementation discussed in the 2005 paper is only weakly Bayesian in the sense that it does not use full priors for every parameter. However, the paper hints at how informative priors (such as joint utilization of evolutionary conservation information) might be used to enhance selection of functional variants. For simplicity, the paper uses a...
Read full abstract