Abstract
Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.
Highlights
Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome
To identify regions for inspection, our method searches for regions that provide the signature of existing in multiple copies and are overrepresented in control “input” sequences. These “input” datasets were generated as controls for ChIP-seq experiments using randomly sheared DNA regions from non-immunoprecipitated chromatin
ENCODE uses this as a quality control metric with some experiments having up to 87% of reads falling into blacklisted regions[5]
Summary
Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. The original ENCODE blacklist, termed the Duke Excluded Regions (DER), was manually curated on the Homo sapiens (human) genome assembly GRCh37 (hereafter referred to as hg19) to cover a large number of repeat elements in the genome, rRNA, alpha satellites, and other simple repeats This list was further updated, referred to as ENCODE Data Analysis Center (DAC) blacklisted regions, to include regions of high signal that presumably represent unannotated repeats in the genome. The removal of these regions eliminated significant background noise that otherwise would have been thought to have been due to biological variation[2]. 562 kb 564 kb 566 kb 568 kb 570 kb 572 kb 574 kb www.nature.com/scientificreports c
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.