Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation

Aditya Dubey,Akhtar Rasool

doi:10.14569/ijacsa.2020.0111186

Abstract

In the era of big data, a significant amount of data is produced in many applications areas. However due to various reasons including sensor failures, communication failures, environmental disruptions, and human errors, missing values are found frequently These missing data in the observed data make a challenge for other data mining approaches, requiring the missed data to be handled at the preprocessing stage of data mining. Several approaches for handling the missing data have been proposed in the past. These approaches consider the whole dataset for making a prediction, making the whole imputation approach to be cumbersome. This paper proposes the procedure which makes use of the local similarity structure of the dataset for making an Imputation. The K-means clustering technique along with the weighted KNN makes efficient imputation of the missed value. The results are compared against imputations by mean substitution and Fuzzy C Means (FCM). The proposed imputation technique shows that it performs better than other imputation procedures.

Highlights

Since the age of big data began, the collection of data from various sources, and the resultant amount of data has risen to the greatest extent [1]
Multivariate datasets are prevalent in several real-world applications, such as electrical system analysis, meteorological or economical strategy planning, security control, and plenty more
Multiple sensors are deployed to produce datasets, and they typically have one target to generate the data as activity occurs

Summary

Introduction

Since the age of big data began, the collection of data from various sources, and the resultant amount of data has risen to the greatest extent [1]. Multiple sensors are deployed to produce datasets, and they typically have one target to generate the data as activity occurs. In a power grid application several sensors diagnosing the state of power transformers, produce the data by monitoring the state of gases over time [2]. In the era of IoT, a vast number of sensors are utilized for generating the multivariate environmental conditions, for example, the air or water pollution [3]. One major issue handled in the preprocessing step is missed value. The raw dataset generated by the sensor network typically includes missing values due to the rough working conditions or uncontrolled variables such as adverse weather conditions, malfunctions of the infrastructure, or unstable signals. The problem of missing data is quite prevalent in many applications. The outcome is that the data observed cannot be evaluated due to the incompleteness of the datasets

Objectives

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2020
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

What is missing from my missing data plan?
Sharon D Yeatts ... Renée H Martin
Stroke | VOL. 46
Sharon D Yeatts, et. al.Sharon D Yeatts ... Renée H Martin
07 May 2015
Stroke | VOL. 46

Knowledge Discovery in Databases with Diversity of Data Types
Qingxiang Wu ... Martin Mcginnity
-
Qingxiang Wu, et. al.Qingxiang Wu ... Martin Mcginnity
01 Jan 2009
01 Jan 2009

Imputation using information fusion technique for sensor generated incomplete data with high missing gap
Deepak Adhikari ... Jinyu Zhan
Microprocessors and Microsystems | VOL. -
Deepak Adhikari, et. al.Deepak Adhikari ... Jinyu Zhan
01 Jan 2020
Microprocessors and Microsystems | VOL. -

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation
P S Raja ... K Thangavel
Neural Computing and Applications | VOL. 32
P S Raja, et. al.P S Raja ... K Thangavel
19 Oct 2019
Neural Computing and Applications | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications