Parallelization of Multi-label classification for large data sets

Shinjini Biswas,V Susheela Devi

doi:10.1109/ssci.2018.8628763

Abstract

Over the last few years, multi-label learning has received a lot of attention in research and industries. Since a pattern can belong to more than one class at the same time, it is a very challenging task to classify a test pattern. Multi-label classification algorithms while inferring on large data sets take a long time to run. So, there is a growing demand of an effective and efficient method for multi-label classification problems, both in terms of accuracy and speed. We endeavour to improve the performance and accuracy of a multi-label classification algorithm which, given a pattern, can predict the set of labels it belongs to, for large data sets, using parallel computing in a distributed manner. We also reduced the dimensionality of large data sets with very large number of features by removing the redundant features using a feature selection method (Fscore) [1] to improve the accuracy and reduce the time taken for training phase of the multi-label classification algorithm.The result shows the benefits of using parallel processing over the traditional single-node execution, tested over five benchmark multi-label data sets, in terms of both accuracy and speedup of the process.

Full Text