Abstract

Motivation: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth.Results: We present the beta-binomial Gaussian process model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine it with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present simulations exploring different experimental design choices and results on real data from Drosophila experimental evolution experiment in temperature adaptation.Availability and implementation: R software implementing the test is available at https://github.com/handetopa/BBGP.Contact: hande.topa@aalto.fi, agnes.jonas@vetmeduni.ac.at, carolin.kosiol@vetmeduni.ac.at, antti.honkela@hiit.fiSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • SNPs with consistent change in allele frequency were identified with Cochran-Mantel-Haenszel test (CMH) by Orozco-Ter Wengel et al (2012)

  • In addition to taking an arbitrary threshold of the top 2000 SNPs, we considered the full distributions of p-values for the CMH and the distribution of Bayes factors for the BBGP based tests

  • For each Gene Ontology (GO) category we compared distribution of all SNP-values (p-values for the CMH and Bayes factors for the GP) in that GO gene set to the distribution outside that gene set using a one-tailed Mann-Whitney U test (MWU) as applied by Segre et al (2010)

Read more

Summary

Tests of parameter choice for experimental design

As whole-genome simulations are computationally very demanding, we decided to simulate only a single chromosome arm (2L) with 25 selected SNPs using various parameter settings This reduces the running times significantly, but the length of the genome segment (∼ 16M b) and the number of selected SNPs used are still realistic proxy to the performance on the whole-genome. Orozco-Ter Wengel et al (2012) used Gowinda (Kofler and Schlotterer, 2012) to test significance of overrepresentation of candidate SNPs in each GO category. For each GO category we compared distribution of all SNP-values (p-values for the CMH and Bayes factors for the GP) in that GO gene set to the distribution outside that gene set using a one-tailed Mann-Whitney U test (MWU) as applied by Segre et al (2010). The top ranked candidate categories were similar in both cases (see Table S3, S4)

Gene Set Enrichment with Gowinda
Gene Set Enrichment with Mann-Whitney U Test
Tables and Figures
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call