Abstract

Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies. It is a highly challenging task to infer from these data the generating effects and cooperating genes. Here, we present an unsupervised hypothesis-generating learning concept termed signal dissection by correlation maximization (SDCM) that dissects large high-dimensional datasets into signatures. Each signature captures a particular signal pattern that was consistently observed for multiple genes and samples, likely caused by the same underlying interaction. A key difference to other methods is our flexible nonlinear signal superposition model, combined with a precise regression technique. Analyzing gene expression of diffuse large B-cell lymphoma, our method discovers previously unidentified signatures that reveal significant differences in patient survival. These signatures are more predictive than those from various methods used for comparison and robustly validate across technological platforms. This implies highly specific extraction of clinically relevant gene interactions.

Highlights

  • Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies

  • In absence of any overlapping effects and measurement noise, we model gene expression (GE) data that was sorted by such specific gene and sample orders as a heatmap in which each participating gene row and each sample column is comprised of monotonic expression values only, i.e., all follow a common order

  • We provide an unsupervised learning concept for dissection of large datasets that are typically comprised of the collective signal from many unknown interactions and substantial measurement noise

Read more

Summary

Introduction

Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies. Analyzing gene expression of diffuse large B-cell lymphoma, our method discovers previously unidentified signatures that reveal significant differences in patient survival. These signatures are more predictive than those from various methods used for comparison and robustly validate across technological platforms. The analysis of gene expression (GE) using microarrays or high-throughput RNA-sequencing[1] allows the determination of molecular interactions and GE programs in cancer cells[2,3] These technologies measure concurrent GE programs in all cells of a sample collectively. Any gene subset showing significant co-expression for any subset of samples could represent a biological interaction

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.