Abstract
MotivationEven within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.ResultsWe introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods, such as Pearson correlation and mutual information. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson correlation or mutual information.Availability and implementationCode is available online (https://github.com/javier-pardodiaz/sdcorGCN). Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
IntroductionWhile noisy, contains key information about biological processes (Kothapalli et al, 2002)
Gene expression data, while noisy, contains key information about biological processes (Kothapalli et al, 2002)
Using STRING, we show that networks from signed distance correlation capture more biological information and are structurally more stable than networks based on Pearson or Spearman correlation or mutual information
Summary
While noisy, contains key information about biological processes (Kothapalli et al, 2002). One motivation behind creating these networks is that genes which are coexpressed across multiple samples are likely to have related functions (Hughes et al, 2000; Makrodimitris et al, 2020; Stuart et al, 2003; van Noort et al, 2003), allowing inference of gene function using guilt by association approaches (Wolfe et al, 2005). This procedure is especially useful if the studied organism is poorly annotated. The lack of reliable genomic functional information may hinder the construction of gene coexpression networks and the validation of their accuracy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.