Abstract
BackgroundAn increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.ResultsWe describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.ConclusionsMatching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.
Highlights
An increasing number of genomic studies interrogating more than one molecular level is published
They are implemented in the R-package sigaR
The The Cancer Genome Atlas (TCGA) I and II data sets differ in their gene expression data, which have been generated on different platforms
Summary
Five data sets have been downloaded to compare the matching procedures. Data set 1, referred to as the Chin. The even worse ‘performance’ (1.8% matched gene expression features) on the Chin data set of the distanceAny procedure with a smaller window may be attributed to the size of DNA copy number features (BACs) They are rather long compared to the gene expression features, resulting in distances between the midpoints. For the Chin data set the distance procedure finds most significant genes, followed by distanceAny (< 100 k), overlapPlus and other overlap methods This order is concordant with the matching result: the more matched genes, the more discoveries. This could be interpreted as the matched genes being assigned an unrelated DNA copy number signature This comparison of downstream analyses suggests that Generated on a high-resolution DNA copy number platform) the overlapAny procedure may be preferred
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.