A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Hong Zhou,Huan-Po Hsu,Kun-Ming Yu,Yen-Chiu Chen

doi:10.1109/access.2021.3059298

Hong Zhou, Huan-Po Hsu + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3059298

Copy DOI

Abstract

Under Industry 4.0, manufacturing quality prediction has been gaining increased interest from researchers and manufacturers. From the analysis of previous studies on quality predictions using machine learning, it became clear that the high dimensionality and imbalance of data are major and common problems affecting the learning performance. This work uses a hybrid method to address this issue, applying a Synthetic Minority Oversampling Technique & TomekLinks balancing approach to create balanced data and using Random Forest as the feature selecting measurement to reduce the dimensionality of data. In addition, a Fine Gaussian Support Vector Machine (Fine Gaussian SVM) based on the representative set of features selected by the hybrid method utilized is employed in this work to predict product quality. The results of experimentation demonstrate that the hybrid method proposed in this work performs well for manufacturing quality prediction and offers a simple, quick and powerful way to address the problem of feature selection encountered by the imbalanced classification.

Highlights

With the advent of Industry 4.0, referred to as the fourth industrial revolution, smart factory and manufacturing has become a new trend that seems to be the future for industrial development
Minimum Redundancy and Maximum Relevance [25], [26] measures the similarity between features and targets according to the mutual information and aims to select a subset of features where each feature has the maximum relevance between the feature and the target, as well as the minimum redundancy among the rest of the features in the subset
Two different operations are performed on the sample pair in Tomek links: 1. Under-sampling: If the sample pair contains the minority class sample of the original imbalanced data set, the sample belonging to the majority class in the pair is eliminated

Summary

INTRODUCTION

With the advent of Industry 4.0, referred to as the fourth industrial revolution, smart factory and manufacturing has become a new trend that seems to be the future for industrial development. This study attempts to investigate the dimension reduction issue through feature selection algorithms based on the imbalanced data, taking the manufacturing quality prediction as an application example. A hybrid algorithm RFSTL, is proposed based on the SMOTE&Tomek links algorithm for balancing data, and Random Forest for feature selection. By this way, the imbalanced and high dimension issues are solved in the data and feature processing stage, before model learning. Zhou et al.: Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction simulated to predict manufacturing quality as a case study These experiments demonstrate that, assisted by the RFSTL algorithm, the conventional classification algorithms can work effectively in the case of an imbalanced dataset with a high dimension. The core mechanism of data reconstruction is to alter the class distributions by resampling the data, which can be divided into three categories:

Undersampling

Oversampling

Combined sampling

Wrapper methods for feature selection

20: Compute

Data cleaning

STOPPING CRITERIA FOR FEATURE SELECTION

SIMULATION AND EXPERIMENT RESULTS

Normalization

Data Balancing

Feature Selection

Performance evaluation

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 48	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

One-Hour Ahead Prediction of Solar Irradiance Using Support Vector Machines
Lee Wai Chong ... Rong Wang Ng
-
Lee Wai Chong, et. al.Lee Wai Chong ... Rong Wang Ng
01 Mar 2018
01 Mar 2018

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm.
Garba Abdulrauf Sharifai ... Zurinahni Zainol
Genes | VOL. 11
Garba Abdulrauf Sharifai, et. al.Garba Abdulrauf Sharifai ... Zurinahni Zainol
27 Jun 2020
Genes | VOL. 11

Resampling Imbalanced Data and Impact of Attribute Selection Methods in High Dimensional Data
K Ulaga Priya ... S Pushpa
-
K Ulaga Priya, et. al.K Ulaga Priya ... S Pushpa
01 Jan 2021
01 Jan 2021

Feature selection for high dimensional imbalanced class data based on F-measure optimization
Chunkai Zhang ... Lin Yao
-
Chunkai Zhang, et. al.Chunkai Zhang ... Lin Yao
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access