Abstract

Machine learning has become an everyday tool in so many fields that there is plenty of software to run many of these algorithms in every device, from supercomputers to embedded appliances. Most of these methods fall into the category known as standard learning, being supervised models (guided by pre-labeled examples) aimed to classify new patterns into exactly one category. This way, machine learning is in charge of getting rid of junk emails, labeling people in a picture, or detecting a fraudulent transaction when using a credit card. Aside from unsupervised learning methods, which are usually applied to group similar patterns, infer association rules and similar tasks, some non-standard supervised machine learning problems have been faced in late years. Among them, multilabel learning is arguably the most popular one. These algorithms aim to produce models in which each data pattern may be linked to several categories at once. Thus, a multilabel classifier generates a set of outputs instead of only one as a standard classifier does. However, software tools for multilabel learning tend to be scarce. This paper provides multilabel researchers with a comprehensive review of the currently available multilabel learning software. It is written following a didactic approach, focusing on how to accomplish each task rather than simply offering a list of programs and websites. The goal is to help finding the most appropriate resource to complete every step, from locating datasets and partitioning them to running many of the multilabel algorithms proposed in the literature until now.

Highlights

  • The availability of software such as R’s caret package [1], Matlab’s Machine Learning toolbox [2], Java’s WEKA application [3] and Python’s scikit-learn package [4], to mention only a few of the existing alternatives, puts data analysis and data mining capabilities at the fingertips of researchers, students and practitioners

  • In addition to measurements based in the aforementioned confusion matrix, in multilabel learning [6] (MLL) there is other group known as ranking-based metrics

  • MULTILABEL EXPLORATORY DATA ANALYSIS R TOOLS The main multilabel Exploratory data analysis (EDA) tool for R is inside the mldr [40] package

Read more

Summary

INTRODUCTION

The availability of software such as R’s caret package [1], Matlab’s Machine Learning toolbox [2], Java’s WEKA application [3] and Python’s scikit-learn package [4], to mention only a few of the existing alternatives, puts data analysis and data mining capabilities at the fingertips of researchers, students and practitioners. A typical assumption is that each data pattern is linked to only one label Sometimes this label can take one of two values, i.e. the mail is spam or it is not. Most EDA and DM software tools available nowadays are aimed to work with binary and multiclass data. Single-label learning, known as standard learning, is probably the most common scenario when working in EDA and DM tasks, but it is certainly not the only one. We are interested in MLL, since it is the most common case of non-standard learning.

Charte
EXPLORATORY DATA ANALYSIS TOOLS
MULTILABEL EXPLORATORY DATA ANALYSIS PYTHON TOOLS
PARTITIONING AND TRANSFORMING MULTILABEL DATASETS
COMMON DATA TRANSFORMATIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call