Abstract

The aim of this study is dimension reduction of multidimensional gene expression data using supervised principal component analysis (S-PCA) and –proposed as a new approach- supervised principal component analysis with artificial neural networks (S-ANN-PCA) and to compare performances of these two methods by using random survival forests (RSF). In simulation application 5000 genes were generated according to multivariate normal distribution and then survival time that is correlated to these gene data were generated for 100 units. Simulation step was carried out with 1000 repetitions. In addition, gene expression data for 240 individuals with extensive B-cell lymphoma (DLBCL) were used. Dimension reduction was done using Wald statistic in selection of important genes. The new data sets obtained from the methods were analyzed using RSF analysis.In the simulation application, it was obtained that the explanatoriness of S-PCA was significantly different from S-ANN-PCA (p<0.001). In the DLBCL data application, it was found that the error rate for the S-PCA was 36.78% and 43% for the S-ANN-PCA as a result of RSF. The importance value of S-PCA method was found to be higher and its error rate was found to be lower than the other method.S-PCA performed better than S-ANN-PCA in analyzing gene expression data experiencing a multidimensional problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.