Abstract

Identifying genomic regions with unusually high local haplotype homozygosity represents a powerful strategy to characterize candidate genes responding to natural or artificial positive selection. To that end, statistics measuring the extent of haplotype homozygosity within (e.g. EHH, iHS) and between (Rsb or XP-EHH) populations have been proposed in the literature. The rehh package for r was previously developed to facilitate genome-wide scans of selection, based on the analysis of long-range haplotypes. However, its performance was not sufficient to cope with the growing size of available data sets. Here, we propose a major upgrade of the rehh package, which includes an improved processing of the input files, a faster algorithm to enumerate haplotypes, as well as multithreading. As illustrated with the analysis of large human haplotype data sets, these improvements decrease the computation time by more than one order of magnitude. This new version of rehh will thus allow performing iHS-, Rsb- or XP-EHH-based scans on large data sets. The package rehh 2.0 is available from the CRAN repository (http://cran.r-project.org/web/packages/rehh/index.html) together with help files and a detailed manual.

Highlights

  • Next-generation sequencing (NGS) technologies have deeply transformed the nature of polymorphism data

  • We used rehh 2.0 to analyze two human data sets consisting of 236 and 342 haplotypes of 110,200 single nucleotide polymorphism (SNP) mapping to HSA2 that were sampled in the CEU and JPT+CHB populations respectively

  • The R package rehh (Gautier & Vitalis, 2012) has been widely used since its first release, the increasing dimension of haplotype datasets typically available in most species led to serious limitations

Read more

Summary

Introduction

Next-generation sequencing (NGS) technologies have deeply transformed the nature of polymorphism data. Until recently, limited by the amount of available data in a handful of presumably independent markers, they have access to dense single nucleotide polymorphism (SNP) data in both model and non-model species (Davey et al , 2011). In those species where genome assemblies are available, the analysis of haplotype structure in a population has proved useful to detect recent positive selection (Sabeti et al , 2002). A powerful strategy to characterize candidate genes responding to natural or artificial positive selection consists in identifying genomic regions with unusually high local haplotype homozygosity, relatively to neutral expectation (Sabeti et al , 2002)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.