Abstract

• New tree-based method for ranking tasks in imbalanced settings. • Direct optimization of the average precision during the meta-tree construction. • Comparison to tree-based methods as well as random forests and gradient tree boosting methods. • Inference of interpretable models to support decision making for tax fraud detection. In this paper, we address the challenging problem of learning to rank from highly imbalanced data . This scenario requires to resort to specific metrics able to account the scarcity of the so-called positive examples. We present MetaAP , a tree-based ranking algorithm, which induces meta-trees by optimizing directly during the learning process the Average Precision ( AP ). This latter has been shown to be more relevant than the area under the ROC curve ( AUC – ROC ) when the objective is to push the examples of interest at the very top of the list. This effect of the AP in tree-based ranking is particularly wished to address fraud detection tasks where (i) the budget is often constrained (in terms of possible controls) and (ii) the interpretability of the induced models is required to support decision making. After an extensive comparative study on 28 public datasets showing that MetaAP is significantly better than other tree-based ranking methods, we tackle a tax fraud detection task coming from a partnership with the French Ministry of Economy and Finance. The results show that MetaAP is able to make the tax audit process much more efficient.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call