Multi-label classification is a challenging problem when the number of labels is large. One simple strategy that appeared in the recent literature is to embed the labels in a latent binary subspace with autoencoders and then train binary classifiers to predict each latent binary variable independently. Latent predictions are afterwards fed to the decoder to provide the final label estimate. The goal is not only to reduce the classification time, but also to capture implicitly some useful information on the dependency structure of the labels. Despite being pleasingly simple, we show that this technique has some shortcomings and that information on the latent variables dependencies has to be incorporated into the learning process to solve the MLC task efficiently under the zero-one loss. Our contribution is two-fold: i) we propose a ”volume-preserving” neural-based binary stochastic autoencoder (BSAE) that guarantees that the maximum a posteriori probability (MAP) solution in the latent space is decoded as the Bayes-optimal solution in the original multi-label space for the zero-one loss, and ii) we apply the method to identify a factorization of the latent variables into a product of conditionally independent terms to facilitate the estimation of the MAP solution. Our experiments on multiple datasets confirm our hypothesis that basic autoencoders do not necessarily disentangle the dependency structure of the label space, and that exploiting latent variables dependencies brings about significant gains in terms of zero-one loss.
Read full abstract