A novel multi-label classification algorithm based on K-nearest neighbor and random walk

Zhen-Wu Wang,Si-Kai Wang,Ben-Ting Wan,William Wei Song

doi:10.1177/1550147720911892

Abstract

The multi-label classification problem occurs in many real-world tasks where an object is naturally associated with multiple labels, that is, concepts. The integration of the random walk approach in the multi-label classification methods attracts many researchers’ sight. One challenge of using the random walk-based multi-label classification algorithms is to construct a random walk graph for the multi-label classification algorithms, which may lead to poor classification quality and high algorithm complexity. In this article, we propose a novel multi-label classification algorithm based on the random walk graph and the K-nearest neighbor algorithm (named MLRWKNN). This method constructs the vertices set of a random walk graph for the K-nearest neighbor training samples of certain test data and the edge set of correlations among labels of the training samples, thus considerably reducing the overhead of time and space. The proposed method improves the similarity measurement by differentiating and integrating the discrete and continuous features, which reflect the relationships between instances more accurately. A label predicted method is devised to reduce the subjectivity of the traditional threshold method. The experimental results with four metrics demonstrate that the proposed method outperforms the seven state-of-the-art multi-label classification algorithms in contrast and makes a significant improvement for multi-label classification.

Highlights

In the data mining field, the traditional binary classification or multi-classification problems have been explored substantially
We propose a novel graph-based MLC algorithm, which adopts KNN and random walk algorithms, named multi-label classification based on the random walk graph and the K-nearest neighbor algorithm (MLRWKNN)
The background and reviews of the related work about random walk strategy, the KNN algorithm, and graphbased MLC algorithms are discussed in section ‘‘Related work.’’ Based on the previous research work, we propose our approach MLRWKNN in section ‘‘The principle of the MLRWKNN algorithm,’’ which consists of three components: design of feature similarity computation, construction of random walk graph, and label set prediction

Summary

Introduction

In the data mining field, the traditional binary classification or multi-classification problems have been explored substantially. The multi-label classification (MLC) problem still exists and it has recently attracted increasing research sights due to its wide range of applications, such as text classification,[1,2] gene function classification,[3] social network analysis,[4] and image/video annotation.[5] with the rapid increase of development and applications with wireless sensor networks (WSNs), massive data collected from a large number of monitoring objects[6,7,8,9,10,11,12] are analyzed, clustered, and classified with methods like classic K-nearest neighbor (KNN), support vector machine (SVM) algorithms,[9,10] and MLC methods.[11,12]. With a wireless sensor network system set in a room to collect limb motion data, Guraliuc et al.[9] use the KNN and SVM algorithms to classify limb movements, aiming to develop a method for patient motion therapy. Zhang et al.[12] applied the MLC method to detect multiple data faults[13] (regarded as multiple labels) simultaneously in sensor networks because it is difficult to build detection model for each fault type

Methods

Results

Conclusion