Abstract

Given the increasing size of modern genetic data sets and, in particular, the move towards genome-wide studies, there is merit in considering analyses that gain computational efficiency by being more heuristic in nature. With this in mind, we present results of cladistic analyses methods on the Genetic Analysis Workshop 15 Problem 3 simulated data (answers known). Our analysis attempts to capture similarities between individuals using a series of trees, and then looks for regions in which mutations on those trees can successfully explain a phenotype of interest. Existing varieties of such algorithms assume haplotypes are known, or have been inferred, an assumption that is often unrealistic for genome-wide data. We therefore present an extension of these methods that can successfully analyze genotype, rather than haplotype, data.

Highlights

  • In this paper we adopt a cladistic approach to association mapping

  • The methods are based upon haplotype, rather than single-nucleotide polymorphism (SNP)-by-SNP analysis

  • We show that a cladistic approach based upon greedy haplotype phase information performs quite well

Read more

Summary

Introduction

In this paper we adopt a cladistic approach to association mapping. Such methods were first introduced by Templeton et al [1], but other researchers have subsequently developed the ideas of that paper, or used other clusterbased approaches [2,3,4]. The fact that current cladistic analysis methods act upon haplotype rather than genotype data introduces a problem. Data are increasingly being collected for large numbers of SNPs, frequently via a SNPchip for which data on hundreds of thousands of SNPs might be collected For such data, haplotype phase information is typically unavailable. The inference of haplotype phase is highly computationally intensive, with the better algorithms proving unable to infer phase in data sets containing SNPs on the order of thousands, rather than tens or hundreds. Such an approach is completely (page number not for citation purposes)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call