Abstract

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets.Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.

Highlights

  • Manual exploration of high-dimensional whole-genome datasets is possible, to a limited extent, with newer, high-capacity genome browsers

  • The well established overlap of CpG islands with the promoter regions of genes [1] is critically related to the gene-silencing mechanism of DNA hypermethylation

  • While using spatial proximity to infer functional relationships is a valid approach in many cases, this is not necessary for functional interaction, as chromatin is flexible and many activating and repressive marks act at a distance [2], so ideally any software that attempts to automatically uncover important relationships should be sensitive to these interactions as well

Read more

Summary

Introduction

Manual exploration of high-dimensional whole-genome datasets is possible, to a limited extent, with newer, high-capacity genome browsers. Each of the tests is designed to evaluate whether the spatial distribution of the query intervals is independent of the positions of the reference intervals, and each test is sensitive to a different aspect of known biological relationships.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.