Theoretical characterisation of strand cross-correlation in ChIP-seq

Hayato Anzawa,Hitoshi Yamagata,Kengo Kinoshita

doi:10.1186/s12859-020-03729-6

Hayato Anzawa, Hitoshi Yamagata + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-03729-6

Copy DOI

Abstract

BackgroundStrand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure.ResultsWe introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results.ConclusionsWe present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments.

Highlights

Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis
Strand cross-correlation based approaches are vague for QC metrics in terms of what they measure, even though real analysis results support the correlation between the metrics and Chromatin immunoprecipitation (ChIP) quality, and their potential and limitations have been unclear
The results revealed that maximum of naive cross-correlation (NCC) and mappability-sensitive crosscorrelation (MSCC) coefficients can be regard as the function of the total number of mapped reads, the enriched region length, the total number of binding events and the signal-to-noise ratio (S/N) parameter

Summary

Introduction

Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles measure. Estimation of the S/N is often performed by focusing on the success of the immunoprecipitation and appraising the number of detectable peaks. Among such methodologies, a common metric to evaluate the S/N of ChIP-seq samples is fraction reads in peaks (FRiP, the ratio between total numbers of reads within and outside the peaks).

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 22, 2020
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Theoretical characterisation of strand cross-correlation in ChIP-seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Analysis of Controls in ChIP-seq
Aseel Awdeh ... Theodore J Perkins
-
Aseel Awdeh, et. al.Aseel Awdeh ... Theodore J Perkins
20 Aug 2017
20 Aug 2017

WACS: improving ChIP-seq peak calling by optimally weighting controls
Aseel Awdeh ... Marcel Turcotte
BMC Bioinformatics | VOL. 22
Aseel Awdeh, et. al.Aseel Awdeh ... Marcel Turcotte
15 Feb 2021
BMC Bioinformatics | VOL. 22

De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly.
Aaron T.L Lun ... Gordon K Smyth
Nucleic Acids Research | VOL. 42
Aaron T.L Lun, et. al.Aaron T.L Lun ... Gordon K Smyth
22 May 2014
Nucleic Acids Research | VOL. 42

Practical Guide to ChIP-seq Data Analysis
Borbala Mifsud ... Anaïs F Bardet
-
Borbala Mifsud, et. al.Borbala Mifsud ... Anaïs F Bardet
26 Oct 2018
26 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Theoretical characterisation of strand cross-correlation in ChIP-seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics