Abstract

Datasets gathered from actual systems may include missing data owing to unintentional faults, such as the breakdown of equipment as well as intentional reasons such as sampling inspection. Because missing data can result in incorrect and distorted results when analyzed, they should be addressed before the analysis is performed. Imputation of missing data involves replacing missing entries of data with values calculated from observed features, which is a more reasonable alternative than simple methods, including a complete case analysis. Although various imputation methods exist for missing data, most ignore the local space around it, which may be closely related to missing values. Furthermore, the imputation method, which can partially reflect local relationships, is susceptible to overfitting and has parameter tuning issues owing to the lack of a systematic definition of the local space. Thus, we propose a composite fuzzy hyper-rectangle (H-RTGL) imputation (CFHRI) method with the following characteristics: (i) it defines the local space using an H-RTGL-based one-class classifier to thoroughly describe the data of the target class, and (ii) it imputes the missing entries using a fuzzy model comprising imputation models calculated from H-RTGLs. These features enable CFHRI to formulate the local space adjacent to missing data systematically and alleviate the hazards of overfitting into a certain region of the dataset. We validated our method based on numerical experiments conducted using a dataset gathered from an actual system and comparison of the imputation performance of our method with that of other imputation methods. CFHRI showed statistically significant improvement in 5 datasets among 7 datasets used, and around 10% enhanced in terms of Mean Absolute Error (MAE). Moreover, we could achieve 3–5% of increased classification accuracy of imputed dataset, which indicates CFHRI can be a useful pre-processor of dataset whose purpose is classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call