Abstract

Deep sequencing after bisulfite conversion (BS-Seq) is the method of choice to generate whole genome maps of cytosine methylation at single base-pair resolution. Its application to genomic DNA of Arabidopsis flower bud tissue resulted in the first complete methylome, determining a methylation rate of 6.7% in this tissue. BS-Seq reads were mapped onto an in silico converted reference genome, applying the so-called 3-letter genome method. Here, we present BiSS (Bisufite Sequencing Scorer), a new method applying Smith-Waterman alignment to map bisulfite-converted reads to a reference genome. In addition, we introduce a comprehensive adaptive error estimate that accounts for sequencing errors, erroneous bisulfite conversion and also wrongly mapped reads. The re-analysis of the Arabidopsis methylome data with BiSS mapped substantially more reads to the genome. As a result, it determines the methylation status of an extra 10% of cytosines and estimates the methylation rate to be 7.7%. We validated the results by individual traditional bisulfite sequencing for selected genomic regions. In addition to predicting the methylation status of each cytosine, BiSS also provides an estimate of the methylation degree at each genomic site. Thus, BiSS explores BS-Seq data more extensively and provides more information for downstream analysis.

Highlights

  • Whole genome sequencing of numerous species and individuals has considerably expanded our understanding of biological diversity and evolution, of normal and abnormal phenotypes

  • Not for the first time, pioneering epigenetic research was performed in plants, as the first whole methylome was established for Arabidopsis thaliana [3,4]

  • We noticed a discrepancy between the total mC content calculated after BS-Seq and the frequency estimated from counting cytosines occurring in the raw data from the short-read libraries [4]

Read more

Summary

Introduction

Whole genome sequencing of numerous species and individuals has considerably expanded our understanding of biological diversity and evolution, of normal and abnormal phenotypes. While the discrepancy with previously published values could have been in part due to limited sequencing and unequal coverage, we suspected it to originate mostly from limited mapping of individual short reads to the reference genome since only 78.5% of genomic cytosines were included in at least 2 mapped reads [4] This could have been due to the mapping procedure: the so-called three-letter genome method, in which all genomic cytosines are converted in silico to thymine, before the reads are mapped using ELAND – software from the Illumina company, interpreting C-T mismatches as indicative for methylated cytosines during the downstream analysis [4]. These data show that the BiSS method provides an accurate estimation of the degree of methylation at individual partially methylated genomic sites

Results
Discussion
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call