Abstract
Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered. EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied EdgeScaping to a human tumor GEM to identify sets of genes that exhibit conventional and non-conventional interdependent non-linear behavior associated with brain specific tumor sub-types that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at https://github.com/bhusain/EdgeScaping under the MIT license.
Highlights
A fundamental goal of biology is to discover genetic relationships that coordinate the biochemical mechanisms underlying phenotype expression
A standardized way to contextualize these relationships is via gene co-expression networks (GCNs; known as relevance networks [1]) that are mathematical graphs used to model complex global gene co-expression dependencies extracted from gene expression matrices (GEMs)
To construct the GEM, all normalized isoform datasets for lower grade glioma (LGG) with 534 samples, thyroid cancer (THCA) with 572 samples, glioblastoma (GBM) with 174 samples, ovarian cancer (OV) with 309 samples, and bladder cancer (BLCA) with 427 samples were obtained from The Cancer Genome Atlas [17]
Summary
A fundamental goal of biology is to discover genetic relationships that coordinate the biochemical mechanisms underlying phenotype expression. A standardized way to contextualize these relationships is via gene co-expression networks (GCNs; known as relevance networks [1]) that are mathematical graphs used to model complex global gene co-expression dependencies extracted from gene expression matrices (GEMs). The first reported GCN was by developed by Eisen et al [2], and since GCNs have been used for a several different analyses that are species-specific, including cancer studies [3,4,5,6]. A plethora of software tools and technologies have since been developed for the construction of GCNs, each using a different approach for identifying co-expression patterns. WGCNA [7], CLR [8], MRNET [9], RMTGeneNet [10], KINC [11], petal [12] and FastGCN [13] are few of the more broadly utilized techniques
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.