Abstract

BackgroundMolecular measurements from cancer patients such as gene expression and DNA methylation can be influenced by several external factors. This makes it harder to reproduce the exact values of measurements coming from different laboratories. Furthermore, some cancer types are very heterogeneous, meaning that there might be different underlying causes for the same type of cancer among different individuals. If a model does not take potential biases in the data into account, this can lead to problems when trying to predict the stage of a certain cancer type. This is especially true when these biases differ between the training and test set.ResultsWe introduce a method that can estimate this bias on a per-feature level and incorporate calculated feature confidences into a weighted combination of classifiers with disjoint feature sets. In this way, the method provides a prediction that is adjusted for the potential biases on a per-patient basis, providing a personalized prediction for each test patient. The new method achieves state-of-the-art performance on many different cancer data sets with measured DNA methylation or gene expression. Moreover, we show how to visualize the learned classifiers to display interesting associations with the target label. Applied to a leukemia data set, our method finds several ribosomal proteins associated with the risk group, which might be interesting targets for follow-up studies. This discovery supports the hypothesis that the ribosomes are a new frontier in genadaptivelearninge regulation.ConclusionWe introduce a new method for robust prediction of phenotypes from molecular measurements such as DNA methylation or gene expression. Furthermore, the visualization capabilities enable exploratory analysis on the learnt dependencies and pave the way for a personalized prediction of phenotypes. The software is available under GPL2+ from https://github.com/adrinjalali/Network-Classifier/tree/v1.0.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2647-9) contains supplementary material, which is available to authorized users.

Highlights

  • Molecular measurements from cancer patients such as gene expression and DNA methylation can be influenced by several external factors

  • Jalali and Pfeifer BMC Genomics (2016) 17:501 nodes on additional experiments on synthesized data as shown in Additional file 1. Another issue is that in protein-protein interaction (PPI) networks, genes or proteins, which have been known to researchers longer and are well-known, are studied more and have more edges connected to them; whereas less well-known genes and proteins are in sparser areas of the network

  • Materials Data sources In this article, our method is applied to two different data types: gene expression data and DNA methylation data, which we retrieved from The Cancer Genome Atlas (TCGA) [16]

Read more

Summary

Introduction

Molecular measurements from cancer patients such as gene expression and DNA methylation can be influenced by several external factors. Predicting survival of cancer patients based on measurements from microarray experiments has been a Jalali and Pfeifer BMC Genomics (2016) 17:501 nodes on additional experiments on synthesized data as shown in Additional file 1 Another issue is that in PPI networks, genes or proteins, which have been known to researchers longer and are well-known, are studied more and have more edges connected to them; whereas less well-known genes and proteins are in sparser areas of the network. A central assumption underlying many methods is that all data are drawn from the same unknown underlying distribution This may not be the case, especially for heterogeneous cancer samples, and in particular not for all measured genes

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call