Compressed labeling on distilled labelsets for multi-label learning

Tianyi Zhou,Dacheng Tao,Xindong Wu

doi:10.1007/s10994-011-5276-1

Abstract

Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a {0,1} label matrix compression and recovery termed labeling (CL) to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a labelset distilling method is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compressed labeling on distilled labelsets for multi-label learning

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Jan 6, 2012
Citations: 115

Similar Papers

A nonlinear multi-label learning model based on Tanh mapping
Changzhong Wang ... Yang Huang
Engineering Applications of Artificial Intelligence | VOL. 126
Changzhong Wang, et. al.Changzhong Wang ... Yang Huang
02 Aug 2023
Engineering Applications of Artificial Intelligence | VOL. 126

Robust non-negative sparse graph for semi-supervised multi-label learning with missing labels
Jianghong Ma ... Tommy W.S Chow
Information Sciences | VOL. 422
Jianghong Ma, et. al.Jianghong Ma ... Tommy W.S Chow
30 Aug 2017
Information Sciences | VOL. 422

Pylspack : Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores
Aleksandros Sobczyk ... Efstratios Gallopoulos
ACM Transactions on Mathematical Software | VOL. 48
Aleksandros Sobczyk, et. al.Aleksandros Sobczyk ... Efstratios Gallopoulos
19 Dec 2022
ACM Transactions on Mathematical Software | VOL. 48

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection
Ata Kaban
-
Ata KabanAta Kaban
10 Aug 2015
10 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compressed labeling on distilled labelsets for multi-label learning

Abstract

Talk to us

Similar Papers

More From: Machine Learning