Abstract

Tree ensembles can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Focus of our work is the utility of tree based ensembles as kernel generators that (in conjunction with a regularized linear model) enable kernel learning. We elucidate the performance of the tree based random forest (RF) and gradient boosted tree (GBT) kernels in a comprehensive simulation study comprising of continuous and binary targets. We show that for continuous targets (regression), this kernel learning approach is competitive to the respective tree ensemble in higher dimensional scenarios, particularly in cases with larger number of noisy features. For the binary target (classification), the tree ensemble based kernels and their respective ensembles exhibit comparable performance. We provide the results from several real life datasets for regression and classification relevant for biopharmaceutical and biomedical applications, that are in line with the simulations to show how these insights may be leveraged in practice. We discuss general applicability and extensions of the tree ensemble based kernels for survival targets and interpretable landmarking in classification and regression. Finally, we outline future research for kernel learning due to feature space partitionings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call