Abstract
SummaryComputational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers.Availability and ImplementationAn R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Computational assessment of reproducibility across nucleic acid sequencing data is a pivotal component in genomic studies
We propose a new statistic to summarize canonical correlations that can be used instead of genome-wide Pearson correlation coefficient, with the advantage of using the profile of the genomic regions to study their covariance at higher orders
To exemplify the methodology we explored the correlation between the nucleosomal histone modifications (HMs) H3K4me3 and several transcription factor (TF) and chromatin epigenetic remodelers
Summary
Computational assessment of reproducibility across nucleic acid sequencing data is a pivotal component in genomic studies. Reproducibility can be evaluated by genome-wide Pearson correlation analysis, and peaks in replicates can be compared using Irreproducible Discovery Rate (IDR) analysis and/ or overlap analysis (Bailey et al, 2013; Li et al, 2011). The author has previously developed a methodology that, by using functional principal component analysis, revealed novel correlations between histone modifications that do not colocalize (Madrigal and Krajewski, 2015). We present fCCAC, a functional canonical correlation analysis approach to allow the assesment of: (i) reproducibility of biological or technical replicates analyzing their shared covariance in higher order components; (ii) the associations between different datasets. We propose a new statistic to summarize canonical correlations that can be used instead of genome-wide (or peak based) Pearson correlation coefficient, with the advantage of using the profile of the genomic regions to study their covariance at higher orders.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.