Towards scalable quantile regression trees

Harish S Bhat,Nitesh Kumar,Garnet J Vaz

doi:10.1109/bigdata.2015.7363741

Abstract

We provide an algorithm to build quantile regression trees in O(N log N) time, where N is the number of instances in the training set. Quantile regression trees are regression trees that model conditional quantiles of the response variable, rather than the conditional expectation as in standard regression trees. We build quantile regression trees by using the quantile loss function in our node splitting criterion. The performance of our algorithm stems from new online update procedures for both the quantile function and the quantile loss function. We test the quantile tree algorithm in three ways, comparing its running time against implementations of standard regression trees, demonstrating its ability to recover a known set of nonlinear quantile functions, and showing that quantile trees yield smaller test set errors (computed using mean absolute deviation) than standard regression trees. The tests include training sets with up to 16 million instances. Overall, our results enable future use of quantile regression trees for large-scale data mining.

Full Text