Multi-instance learning is a classic algorithm within the weakly supervised learning framework. Most previous approaches to multi-instance learning have typically assumed that every bag comes with its own label. However, in real-world applications, obtaining fully-labeled bags can be a challenging and resource-intensive task, primarily due to the substantial time and labor required for labeling each instance. Fortunately, collecting bags with only positive and unlabeled instances is a more feasible option. In this paper, we aim to learn an efficient binary classifier using positive and unlabeled bags. To address the absence of negative bags, we adopt a novel strategy that combines label noise learning and multi-instance kernels. Our approach involves a comprehensive analysis of empirical risk minimization in the presence of label noise, allowing us to decompose the loss function for negative bags with false negative labels into two distinct parts. Importantly, only the second part of this loss function is affected by the presence of noisy labels. To mitigate the influence of noisy labels on the negative bags, we propose the Multi-Instance Kernel-based Centroid Estimation (MIKCE) technique. MIKCE is a versatile method that can be employed with various types of data, including linear and nonlinear cases. Additionally, we provide an upper bound on the generalization error, ensuring the convergence of the MIKCE algorithm. Finally, we conduct numerical experiments on 31 public datasets to empirically validate the effectiveness of the MIKCE approach.
Read full abstract