Abstract

SummaryComputational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers.Availability and ImplementationAn R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Computational assessment of reproducibility across nucleic acid sequencing data is a pivotal component in genomic studies

  • We propose a new statistic to summarize canonical correlations that can be used instead of genome-wide Pearson correlation coefficient, with the advantage of using the profile of the genomic regions to study their covariance at higher orders

  • To exemplify the methodology we explored the correlation between the nucleosomal histone modifications (HMs) H3K4me3 and several transcription factor (TF) and chromatin epigenetic remodelers

Read more

Summary

Introduction

Computational assessment of reproducibility across nucleic acid sequencing data is a pivotal component in genomic studies. Reproducibility can be evaluated by genome-wide Pearson correlation analysis, and peaks in replicates can be compared using Irreproducible Discovery Rate (IDR) analysis and/ or overlap analysis (Bailey et al, 2013; Li et al, 2011). The author has previously developed a methodology that, by using functional principal component analysis, revealed novel correlations between histone modifications that do not colocalize (Madrigal and Krajewski, 2015). We present fCCAC, a functional canonical correlation analysis approach to allow the assesment of: (i) reproducibility of biological or technical replicates analyzing their shared covariance in higher order components; (ii) the associations between different datasets. We propose a new statistic to summarize canonical correlations that can be used instead of genome-wide (or peak based) Pearson correlation coefficient, with the advantage of using the profile of the genomic regions to study their covariance at higher orders.

Implementation
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.