Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Aditya Tayal,Thomas F Coleman,Yuying Li

doi:10.1007/s10618-017-0540-z

Abstract

Rapid explosion in data accumulation has yielded large scale data mining problems, many of which have intrinsically unbalanced or rare class distributions. Standard classification algorithms, which focus on overall classification accuracy, often perform poorly in these cases. Recently, Tayal et al. (IEEE Trans Knowl Data Eng 27(12):3347---3359, 2015) proposed a kernel method called RankRC for large-scale unbalanced learning. RankRC uses a ranking loss to overcome biases inherent in standard classification based loss functions, while achieving computational efficiency by enforcing a rare class hypothesis representation. In this paper we establish a theoretical bound for RankRC by establishing equivalence between instantiating a hypothesis using a subset of training points and instantiating a hypothesis using the full training set but with the feature mapping equal to the orthogonal projection of the original mapping. This bound suggests that it is optimal to select points from the rare class first when choosing the subset of data points for a hypothesis representation. In addition, we show that for an arbitrary loss function, the Nystrom kernel matrix approximation is equivalent to instantiating a hypothesis using a subset of data points. Consequently, a theoretical bound for the Nystrom kernel SVM can be established based on the perturbation analysis of the orthogonal projection in the feature mapping. This generally leads to a tighter bound in comparison to perturbation analysis based on kernel matrix approximation. To further illustrate computational effectiveness of RankRC, we apply a multi-level rare class kernel ranking method to the Heritage Health Provider Network's health prize competition problem and compare the performance of RankRC to other existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Abstract

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Journal: Data Mining and Knowledge Discovery	Publication Date: Sep 8, 2017
Citations: 50

Similar Papers

RankRC: Large-Scale Nonlinear Rare Class Ranking
Aditya Tayal ... Thomas F Coleman
IEEE Transactions on Knowledge and Data Engineering | VOL. 27
Aditya Tayal, et. al.Aditya Tayal ... Thomas F Coleman
01 Dec 2015
IEEE Transactions on Knowledge and Data Engineering | VOL. 27

Machine Learning Models for Classifying Imbalanced Class Datasets Using Ensemble Learning
Aditya Yulis Kusdiyanto ... Yoga Pristyanto
-
Aditya Yulis Kusdiyanto, et. al.Aditya Yulis Kusdiyanto ... Yoga Pristyanto
08 Dec 2022
08 Dec 2022

Adaptive cost-sensitive stance classification model for rumor detection in social networks
Zahra Zojaji ... Behrouz Tork Ladani
Social Network Analysis and Mining | VOL. 12
Zahra Zojaji, et. al.Zahra Zojaji ... Behrouz Tork Ladani
09 Sep 2022
Social Network Analysis and Mining | VOL. 12

Classification with Local Clustering in Imbalanced Data Sets
Hua Ji ... Hua Xiang Zhang
Advanced Materials Research | VOL. 219-220
Hua Ji, et. al.Hua Ji ... Hua Xiang Zhang
01 Mar 2011
Advanced Materials Research | VOL. 219-220

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Abstract

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery