Abstract

Under Industry 4.0, manufacturing quality prediction has been gaining increased interest from researchers and manufacturers. From the analysis of previous studies on quality predictions using machine learning, it became clear that the high dimensionality and imbalance of data are major and common problems affecting the learning performance. This work uses a hybrid method to address this issue, applying a Synthetic Minority Oversampling Technique & TomekLinks balancing approach to create balanced data and using Random Forest as the feature selecting measurement to reduce the dimensionality of data. In addition, a Fine Gaussian Support Vector Machine (Fine Gaussian SVM) based on the representative set of features selected by the hybrid method utilized is employed in this work to predict product quality. The results of experimentation demonstrate that the hybrid method proposed in this work performs well for manufacturing quality prediction and offers a simple, quick and powerful way to address the problem of feature selection encountered by the imbalanced classification.

Highlights

  • With the advent of Industry 4.0, referred to as the fourth industrial revolution, smart factory and manufacturing has become a new trend that seems to be the future for industrial development

  • Minimum Redundancy and Maximum Relevance [25], [26] measures the similarity between features and targets according to the mutual information and aims to select a subset of features where each feature has the maximum relevance between the feature and the target, as well as the minimum redundancy among the rest of the features in the subset

  • Two different operations are performed on the sample pair in Tomek links: 1. Under-sampling: If the sample pair contains the minority class sample of the original imbalanced data set, the sample belonging to the majority class in the pair is eliminated

Read more

Summary

INTRODUCTION

With the advent of Industry 4.0, referred to as the fourth industrial revolution, smart factory and manufacturing has become a new trend that seems to be the future for industrial development. This study attempts to investigate the dimension reduction issue through feature selection algorithms based on the imbalanced data, taking the manufacturing quality prediction as an application example. A hybrid algorithm RFSTL, is proposed based on the SMOTE&Tomek links algorithm for balancing data, and Random Forest for feature selection. By this way, the imbalanced and high dimension issues are solved in the data and feature processing stage, before model learning. Zhou et al.: Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction simulated to predict manufacturing quality as a case study These experiments demonstrate that, assisted by the RFSTL algorithm, the conventional classification algorithms can work effectively in the case of an imbalanced dataset with a high dimension. The core mechanism of data reconstruction is to alter the class distributions by resampling the data, which can be divided into three categories:

Undersampling
Oversampling
Combined sampling
Wrapper methods for feature selection
20: Compute
Data cleaning
STOPPING CRITERIA FOR FEATURE SELECTION
SIMULATION AND EXPERIMENT RESULTS
Normalization
Data Balancing
Feature Selection
Performance evaluation
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.