Abstract

Multiple instance regression (MIR) operates on a collection of bags, where each bag contains many instances sharing the same real-valued label. Only few instances, called primary instances, contribute to the bag labels. The remaining ones are noisy observations. The goal in MIR is to identify the primary instances within each bag and learn a regression model that can predict the label of a previously unseen bag. In this paper, we show that regression models can be identified as clusters when appropriate features and distances are used. We introduce an algorithm, called Robust Fuzzy Clustering for Multiple Instance Regression (RFC-MIR), that can learn multiple linear models simultaneously. First, RFC-MIR uses constrained fuzzy memberships to obtain an initial partition where instances can belong to multiple models with various degrees. Then, it uses unconstrained possibilistic memberships to allow the initial local models to expand and converge to the global model. These memberships are also used to identify the primary instances within each bag. After clustering, the possibilistic memberships are used to identify the optimal number of regression models. We evaluate our approach on synthetic data sets generated by varying the dimensionality of the feature space, the number of instances per bag, and the noise level. We also validate the RFC-MIR using two real applications: prediction of the yearly average yield of a crop using remote sensing data; and drug activity prediction. These applications have been used consistently to validate existing MIR algorithms. We show that our approach achieves higher accuracy than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call