Abstract

The inherent correlations among gene expressions have received attention. Recently, it was reported that a set of approximately 1000 landmark genes can be utilized for prediction of expression of other genes (target genes). The objective of this study is to predict expression values of target genes based on expression values of landmark genes. A cluster-based regression method is proposed. In the proposed method, clusters are obtained from a set of training instances of a gene and an estimator is obtained per cluster. A test instance is assigned to one of clusters then a regression model corresponding to the cluster predicts expression value. Performance of the proposed method is measured on the GEO (Gene Expression Omnibus) expression data and the GTEx (Genotype-Tissue Expression) expression data. In terms of mean absolute error averaged across target genes, the proposed method significantly outperforms previous approaches in the case of the GEO expression data. The experimental results report that the combination of clustering and regression can outperform the state-of-the art methods such as generative adversarial networks and a gradient boosting based method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call