Abstract

BackgroundThere is growing evidence that DNA methylation alterations may contribute to carcinogenesis. Recent data also suggest that DNA methylation field defects in normal pre-neoplastic tissue represent infrequent stochastic “outlier” events. This presents a statistical challenge for standard feature selection algorithms, which assume frequent alterations in a disease phenotype. Although differential variability has emerged as a novel feature selection paradigm for the discovery of outliers, a growing concern is that these could result from technical confounders, in principle thus favouring algorithms which are robust to outliers.ResultsHere we evaluate five differential variability algorithms in over 700 DNA methylomes, including two of the largest cohorts profiling precursor cancer lesions, and demonstrate that most of the novel proposed algorithms lack the sensitivity to detect epigenetic field defects at genome-wide significance. In contrast, algorithms which recognise heterogeneous outlier DNA methylation patterns are able to identify many sites in pre-neoplastic lesions, which display progression in invasive cancer. Thus, we show that many DNA methylation outliers are not technical artefacts, but define epigenetic field defects which are selected for during cancer progression.ConclusionsGiven that cancer studies aiming to find epigenetic field defects are likely to be limited by sample size, adopting the novel feature selection paradigm advocated here will be critical to increase assay sensitivity.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1056-z) contains supplementary material, which is available to authorized users.

Highlights

  • There is growing evidence that DNA methylation alterations may contribute to carcinogenesis

  • The DNA methylation (DNAm) data for this set were generated on Illumina 27 k beadarrays, and so, because of the design of the 27 k array, we only focused on differentially variable CpG (DVC) which exhibited increases in DNAm in the precursor lesions

  • We decided to compare a total of 5 differential variability (DV) algorithms, with four of these having been proposed recently: (i) Bartlett’s test (BT) [13], (ii) a joint test for differential means and differential variance in DNA methylation (“JDMDV”) [20], (iii) an empirical Bayes Levene-type test (“DiffVar”) [19] and (iv) a test based on a generalized additive model for location and scale (“GAMLSS”) [21]

Read more

Summary

Introduction

There is growing evidence that DNA methylation alterations may contribute to carcinogenesis. Recent data suggest that DNA methylation field defects in normal pre-neoplastic tissue represent infrequent stochastic “outlier” events. This presents a statistical challenge for standard feature selection algorithms, which assume frequent alterations in a disease phenotype. Differential variability has emerged as a novel feature selection paradigm for the discovery of outliers, a growing concern is that these could result from technical confounders, in principle favouring algorithms which are robust to outliers. Feature selection presents an important statistical challenge in the analysis of omic data [1,2,3] It is most often encountered in the context of supervised analyses where one wishes to find features that are informative of differences between two phenotypes of interest (POI). A number of DV tests have emerged, with improved

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call