Abstract
Open chromatin regions (OCRs) are special regions of the human genome that can be accessed by DNA regulatory elements. Several studies have reported that a series of OCRs are associated with mechanisms involved in human diseases, such as cancers. Identifying OCRs using ATAC-seq or DNase-seq is often expensive. It has become popular to detect OCRs from plasma cell-free DNA (cfDNA) sequencing data, because both the fragmentation modes of cfDNA and the sequencing coverage in OCRs are significantly different from those in other regions. However, it is a challenging computational problem to accurately detect OCRs from plasma cfDNA-seq data, as multiple factors—e.g., sequencing and mapping bias, insufficient read depth, etc.—often mislead the computational model. In this paper, we propose a novel bioinformatics pipeline, OCRDetector, for detecting OCRs from whole-genome cfDNA sequencing data. The pipeline calculates the window protection score (WPS) waveform and the cfDNA sequencing coverage. To validate the proposed pipeline, we compared the percentage overlap of our OCRs with those obtained by other methods. The experimental results show that 81% of the TSS regions of housekeeping genes are detected, and our results have obvious tissue specificity. In addition, the overlap percentage between our OCRs and the high-confidence OCRs obtained by ATAC-seq or DNase-seq is greater than 70%.
Highlights
Open chromatin regions (OCRs) are the regions of the human genome that can be contacted by DNA regulatory elements [1,2]
To evaluate the performance of our proposed pipeline, we first obtained the OCRs of the hematopoietic lineage of healthy people and expanded them into a 600 bp window
We find the possible OCRs and verify whether there are a certain number of regular window protection score (WPS) peaks around the region
Summary
Open chromatin regions (OCRs) are the regions of the human genome that can be contacted by DNA regulatory elements [1,2]. The accessibility of chromatin affects the gene expression of tissue cells and it has important regulatory effects on human physiological activities. A recent study reported that different cancers may show different maps of OCRs [3]. By identifying cancer-specific or tissue-specific OCRs, it is possible to study the epigenetic mechanisms of cancers, predict potential markers, and analyze the tumor heterogeneity and subtyping [4,5]. The correlation analysis of gene expression and OCRs reveals possible interactions between distant regulatory elements and gene promoters, including driver oncogenes and targets in cancer immunotherapy [5,6,7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.