ON THE USE OF MULTIPLE INSTANCE LEARNING FOR DATA CLASSIFICATION

Nibras Z Salih,Walaa Khalaf

doi:10.31272/jeasd.conf.2.1.15

Abstract

In the multiple instances learning framework, instances are arranged into bags, each bag contains several instances, the labels of each instance are not available but the label is available for each bag. Whilst in a single instance learning each instance is connected with the label that contains a single feature vector. This paper examines the distinction between these paradigms to see if it is appropriate, to cast the problem within a multiple instance framework. In single-instance learning, two datasets are applied (students’ dataset and iris dataset) using Naïve Bayes Classifier (NBC), Multilayer perceptron (MLP), Support Vector Machine (SVM), and Sequential Minimal Optimization (SMO), while SimpleMI, MIWrapper, and MIBoost in multiple instances learning. Leave One Out Cross-Validation (LOOCV), five and ten folds Cross-Validation techniques (5-CV, 10-CV) are implemented to evaluate the classification results. A comparison of the result of these techniques is made, several algorithms are found to be more effective for classification in the multiple instances learning. The suitable algorithms for the students' dataset are MIBoost with MLP for LOOCV with an accuracy of 75%, whereas SimpleMI with SMO for the iris dataset is the suitable algorithm for 10-CV with an accuracy of 99.33%.

Highlights

Multiple Instance Learning (MIL) is a generalization of supervised learning, which is used to study a notion that correctly defines and propagates defined data on invisible data
The Sequential Minimal Optimization (SMO) with the poly kernel as Support Vector Machine (SVM) classifier, and the same algorithms mentioned above are performed on the iris dataset, using single instance learning
In this paper MIL is presented with single instance learning, to see if it is appropriate to cast the problem within MIL

Summary

Introduction

Multiple Instance Learning (MIL) is a generalization of supervised learning, which is used to study a notion that correctly defines and propagates defined data on invisible data. MIL can be implemented to solve the problem of classification and regression, using a specific algorithm based on a series labeled instance. In single-instance learning, each object is connected with a label that contains a single feature vector, known as the traditional representation of machine learning. In multiple instances learning, these features are known as instances, but the object known as the bag. The labels are connected to the bags that contain different instances of each bag by a separated feature vector. The data are divided into positive and negative bags. Binary classifications are involved with the MIL, where each instance has a label that assigns it to one of the two classes as positive or negative. The advantage of MIL is to solve the label ambiguity, for instances within a bag.

Objectives

Results

Conclusion