Abstract

Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.

Highlights

  • Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome

  • To identify regions for inspection, our method searches for regions that provide the signature of existing in multiple copies and are overrepresented in control “input” sequences. These “input” datasets were generated as controls for ChIP-seq experiments using randomly sheared DNA regions from non-immunoprecipitated chromatin

  • ENCODE uses this as a quality control metric with some experiments having up to 87% of reads falling into blacklisted regions[5]

Read more

Summary

Regions of the Genome

Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. The original ENCODE blacklist, termed the Duke Excluded Regions (DER), was manually curated on the Homo sapiens (human) genome assembly GRCh37 (hereafter referred to as hg19) to cover a large number of repeat elements in the genome, rRNA, alpha satellites, and other simple repeats This list was further updated, referred to as ENCODE Data Analysis Center (DAC) blacklisted regions, to include regions of high signal that presumably represent unannotated repeats in the genome. The removal of these regions eliminated significant background noise that otherwise would have been thought to have been due to biological variation[2]. 562 kb 564 kb 566 kb 568 kb 570 kb 572 kb 574 kb www.nature.com/scientificreports c

Multi Reads
Intersection Size
Methods
Findings
Additional Information
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.