Abstract
Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, chemistry, credit risk screening, image ranking or media memorability. While there already exist reviews concentrating on specific types of ranking problems like label and object ranking problems, there does not yet seem to exist an overview concentrating on instance ranking problems that both includes developments in distinguishing between different types of instance ranking problems as well as careful discussions about their differences and the applicability of the existing ranking algorithms to them. In instance ranking, one explicitly takes the responses into account with the goal to infer a scoring function which directly maps feature vectors to real-valued ranking scores, in contrast to object ranking problems where the ranks are given as preference information with the goal to learn a permutation. In this article, we systematically review different types of instance ranking problems and the corresponding loss functions resp. goodness criteria. We discuss the difficulties when trying to optimize those criteria. As for a detailed and comprehensive overview of existing machine learning techniques to solve such ranking problems, we systematize existing techniques and recapitulate the corresponding optimization problems in a unified notation. We also discuss to which of the instance ranking problems the respective algorithms are tailored and identify their strengths and limitations. Computational aspects and open research problems are also considered.
Highlights
Search-engines like Google provide a list of web-sites that are suitable for the user’s query in the sense that the first web-sites that are displayed are expected to be the most relevant ones
We include a few ranking algorithms primarily designed for object ranking problems but which can directly be applied to instance ranking problems
A common technique is to optimize a sufficiently regular surrogate loss. This principle has already entered instance ranking problems and the algorithms that we review in Sect. 4 operate on particular surrogate losses
Summary
Search-engines like Google provide a list of web-sites that are suitable for the user’s query in the sense that the first web-sites that are displayed are expected to be the most relevant ones. The training data consist of a single set or multiple sets of instances or instance-query pairs where each instance is composed by a feature vector and a certain type of preference information. One can divide these problems further into label, instance and object ranking problems depending on the preference information (Fürnkranz and Hüllermeier 2011). The goal is to learn a scoring function that assigns new feature vectors to a realvalued ranking score Having learned such a scoring function, new feature vectors receive a ranking score which directly leads to a ranking of them, inherited from the natural ordering on the real line. We exclude all algorithms that operate on ranks (which may be computed in a pre-processing step from actual real-valued responses), which is done in many information retrieval works
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.