Abstract

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.

Highlights

  • With the exception of infrequent random somatic mutations, it is widely believed that the same genomic content should be fixed in an organism throughout its lifetime

  • It is performed by the adenosine deaminase that acts on RNA (ADAR) family of deaminases [2,3,4,5] and this process has been implicated in several vital neurological functions [6]

  • We found candidates for DNA and RNA editing as well as a sequencing error that has become incorporated into commonly used genomic resources

Read more

Summary

Introduction

With the exception of infrequent random somatic mutations, it is widely believed that the same genomic content should be fixed in an organism throughout its lifetime This information will serve as a template for exact RNA copies. RNA editing involves alteration of particular RNA nucleotides by changing Adenosine (A) into Inosine (I), which in turn is read as Guanosine (G) [1] It is performed by the adenosine deaminase that acts on RNA (ADAR) family of deaminases [2,3,4,5] and this process has been implicated in several vital neurological functions [6]. A different family of proteins, the AID/APOBEC family of deaminases, can edit both DNA and RNA nucleotides, changing Cytosine (C) into Uracil (U) [13]. Activation-induced deaminase (AID) was discovered to be vital for the antigen-driven diversification of immunoglobulin genes in the vertebrate adaptive immune system [17,18,19] and the APOBEC3s were shown to be involved in the restriction of retrovirus proliferation in primates [20,21]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call