Willows: a memory efficient tree and forest construction package

Heping Zhang,Minghui Wang,Xiang Chen

doi:10.1186/1471-2105-10-130

Abstract

BackgroundExisting tree and forest methods are powerful bioinformatics tools to explore high dimensional data including high throughput genomic data. However, they cannot deal with the data generated by recent genotyping platforms for single nucleotide polymorphisms due to the massive size of the data and its excessive memory demand.ResultsUsing the recursive partitioning technique, we developed a new software package, Willows, to maximize the utility of the computer memory and make it feasible to analyze massive genotype data. This package includes three tree-based methods – classification tree, random forest, and deterministic forest, and can efficiently handle the massive amount of SNP data. In addition, this package can easily set different options (e.g., algorithms and specifications) and predict the class of test samples.ConclusionWe developed Willows in a user friendly interface with the goal of maximizing the use of memory, which is critical for analysis of genomic data. The Willows package is well documented and publicly available at .

Highlights

Existing tree and forest methods are powerful bioinformatics tools to explore high dimensional data including high throughput genomic data
The genotype data from the Framingham Heart Study (FHS, 9,300 subjects and 550,000 single nucleotide polymorphisms (SNPs)) require more than 38.1 GB memory for input when each genotype at a SNP marker is stored in the double data type or 4.8 GB when stored in the byte type
It is noteworthy that PLINK [15] and Chen, et al [16] already utilize efficient memory use algorithms similar to what we propose to use in trees and forests, and the compressed data format designed by PLINK has been adopted by NCBI to distribute genomewide association (GWA) data

Summary

Results

Using the recursive partitioning technique, we developed a new software package, Willows, to maximize the utility of the computer memory and make it feasible to analyze massive genotype data. This package includes three tree-based methods – classification tree, random forest, and deterministic forest, and can efficiently handle the massive amount of SNP data. This package can set different options (e.g., algorithms and specifications) and predict the class of test samples

Background

Results and discussion

Conclusion

17. Breiman L

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 5, 2009
Citations: 42	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Willows: a memory efficient tree and forest construction package

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Tree-Based Methods: Concepts, Uses and Limitations under the Framework of Resource Selection Models
J Carvalho ... J P V Santos
Journal of Environmental Informatics | VOL. 32
J Carvalho, et. al.J Carvalho ... J P V Santos
01 Jan 2018
Journal of Environmental Informatics | VOL. 32

Screening for depression in epilepsy: A model of an enhanced screening tool
Mihael Drinovac ... Tim J Von Oertzen
Epilepsy & Behavior | VOL. 44
Mihael Drinovac, et. al.Mihael Drinovac ... Tim J Von Oertzen
24 Jan 2015
Epilepsy & Behavior | VOL. 44

Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes
Peter C Austin ... Douglas S Lee
Journal of Clinical Epidemiology | VOL. 66
Peter C Austin, et. al.Peter C Austin ... Douglas S Lee
04 Feb 2013
Journal of Clinical Epidemiology | VOL. 66

Nephrotoxicity Development of a Clinical Decision Support System Based on Tree-Based Machine Learning Methods to Detect Diagnostic Biomarkers from Genomic Data in Methotrexate-Induced Rats
Ipek Balikci Cicek ... Zeynep Kucukakcali
Applied Sciences | VOL. 13
Ipek Balikci Cicek, et. al.Ipek Balikci Cicek ... Zeynep Kucukakcali
01 Aug 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Willows: a memory efficient tree and forest construction package

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics