Abstract

Recursive partitioning methods are among the most popular techniques in machine learning. The purpose of this paper is to investigate how to adapt this methodology to the bipartite ranking problem. Following in the footsteps of the TreeRank approach developed in Clémençon and Vayatis (Proceedings of the 2008 Conference on Algorithmic Learning Theory, 2008 and IEEE Trans. Inf. Theory 55(9):4316–4336, 2009), we present tree-structured algorithms designed for learning to rank instances based on classification data. The main contributions of the present work are the following: the practical implementation of the TreeRank algorithm, well-founded solutions to the crucial issues related to the splitting rule and the choice of the “right” size for the ranking tree. From the angle embraced in this paper, splitting is viewed as a cost-sensitive classification task with data-dependent cost. Hence, up to straightforward modifications, any classification algorithm may serve as a splitting rule. Also, we propose to implement a cost-complexity pruning method after the growing stage in order to produce a “right-sized” ranking sub-tree with large AUC. In particular, performance bounds are established for pruning schemes inspired by recent work on nonparametric model selection. Eventually, we propose indicators for variable importance and variable dependence, plus various simulation studies illustrating the potential of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call