Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

Seong Kyu Han,Yoshiharu Muto,Parker C Wilson,Benjamin D Humphreys,Matthew G Sampson,Aravinda Chakravarti,Dongwon Lee

doi:10.1073/pnas.2212810119

Abstract

Chromatin accessibility assays are central to the genome-wide identification of gene regulatory elements associated with transcriptional regulation. However, the data have highly variable quality arising from several biological and technical factors. To surmount this problem, we developed a sequence-based machine learning method to evaluate and refine chromatin accessibility data. Our framework, gapped k-mer SVM quality check (gkmQC), provides the quality metrics for a sample based on the prediction accuracy of the trained models. We tested 886 DNase-seq samples from the ENCODE/Roadmap projects to demonstrate that gkmQC can effectively identify "high-quality" (HQ) samples with low conventional quality scores owing to marginal read depths. Peaks identified in HQ samples are more accurately aligned at functional regulatory elements, show greater enrichment of regulatory elements harboring functional variants, and explain greater heritability of phenotypes from their relevant tissues. Moreover, gkmQC can optimize the peak-calling threshold to identify additional peaks, especially for rare cell types in single-cell chromatin accessibility data.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Dec 12, 2022
Citations: 4	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

Author response: Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification
Zhanying Feng ... Qiuyue Yuan
-
Zhanying Feng, et. al.Zhanying Feng ... Qiuyue Yuan
13 Dec 2022
13 Dec 2022

Abstract PR06: Identifying regulatory transcription factors in primary liver cancer cell lineage commitment using single cell ATAC sequencing
Amanda J Craig ... Tim Greten
Clinical cancer research : an official journal of the American Association for Cancer Research | VOL. 28
Amanda J Craig, et. al.Amanda J Craig ... Tim Greten
01 Sep 2022
Clinical cancer research : an official journal of the American Association for Cancer Research | VOL. 28

Time course regulatory analysis based on paired expression and chromatin accessibility data.
Zhana Duren ... Xi Chen
Genome research | VOL. 30
Zhana Duren, et. al.Zhana Duren ... Xi Chen
18 Mar 2020
Genome research | VOL. 30

Decision letter: CATaDa reveals global remodelling of chromatin accessibility during stem cell differentiation in vivo
Bruce Edgar
-
Bruce EdgarBruce Edgar
02 Dec 2017
02 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America