Abstract

BackgroundMicroarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.ResultsWe have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from . All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.ConclusionvarSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.

Highlights

  • Microarray data are often used for patient classification and gene selection

  • Patient classification and gene selection related to classification are common uses of microarray data, but statistically rigorous and userfriendly tools for gene selection in the context of class prediction are rare

  • Our programs allow the exploratory usage of random forest for identifying large subsets of genes potentially relevant for class prediction

Read more

Summary

Results

We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. The full source code is available, allowing to extend the software. The web-based application is available from http:/ /genesrf2.bioinfo.cnio.es. Conclusion: varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are of methodological interest to bioinformaticians and biostatisticians

Background
Results and discussion
Conclusion
16. Díaz-Uriarte R
19. R Development Core Team
22. Yu H: Rmpi
25. Foster I
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call