Abstract

High-throughput gene expression analysis is widely used. However, analysis is not straightforward. Multiple approaches should be applied and methods to combine their results implemented and investigated. We present methodology for the comprehensive analysis of expression data, including co-expression module detection and result integration via data-fusion, threshold based methods, and a Naïve Bayes classifier trained on simulated data. Application to rare-disease model datasets confirms existing knowledge related to immune cell infiltration and suggest novel hypotheses including the role of calcium channels. Application to simulated and spike-in experiments shows that combining multiple methods using consensus and classifiers leads to optimal results. ExpHunter Suite is implemented as an R/Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite. It can be applied to model and non-model organisms and can be run modularly in R; it can also be run from the command line, allowing scalability with large datasets. Code and reports for the studies are available from https://github.com/fmjabato/ExpHunterSuiteExamples.

Highlights

  • RNA sequencing (RNA-seq) is widely used across molecular biology and biomedicine, including rare disease ­research[1]

  • Co-expression analysis, which searches for groups of co-expressed genes (CEGs) that correlate with phenotypic d­ ata[11], is often overlooked in RNA-seq data analysis, despite its potential for better understanding molecular processes

  • differential expression (DE) analysis was performed on a range of simulated expression datasets, to evaluate how different properties of the dataset can affect the performance of differentially expressed gene (DEG) detection and combination methods

Read more

Summary

Introduction

RNA sequencing (RNA-seq) is widely used across molecular biology and biomedicine, including rare disease ­research[1]. We provide a collection of tools, the ExpHunter Suite, implemented as an R/Bioconductor package including auxiliary scripts for assessing performance and simulating RNA-seq data It incorporates the DEgenes Hunter p­ ipeline[13], in addition to co-expression analysis, multiple reports related to quality control and result interpretation, and provide ways to compare and combine results. Through co-expression analysis, we find examples of divergent expression patterns between mRNA transcript and protein levels for the same gene, detect genes related to the extracellular matrix with a potential role in PMM2-CDG and modules of genes including triggers of NK-κ B and MAPK processes in Lafora disease These finding show the capability of our methodology to detect novel genes and functions for further study

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.