A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Pedro José Pereira,Adriana Pereira,André Pilastri,Paulo Cortez

doi:10.1007/978-3-030-86230-5_44

Abstract

The Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets include millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.

Highlights

The Industry 4.0 concept is increasing the pressure of companies to adopt data intelligence processes in order to remain competitive in the markets they operate [12]
Since five runs are applied for each dataset, it is computationally costly to apply all balancing techniques and Machine Learning (ML) algorithms to all products
The Industry 4.0 revolution is transforming manufacturing companies, which are increasingly adopting data intelligence processes in order to remain competitive in the market

Summary

Introduction

The Industry 4.0 concept is increasing the pressure of companies to adopt data intelligence processes in order to remain competitive in the markets they operate [12]. Quality control is a crucial manufacturing process that can directly impact on productivity by reducing costs, defective products and complaints, among others [16]. In 2016 there was a Kaggle challenge that addressed an industrial manufacturing quality prediction by using ML approaches [9, 12, 16]. Industrial quality ML prediction is addressed as a binary classification task, which is often a nontrivial task for two main reasons. There can be more than 99% of normal cases Under such extreme unbalanced distribution, ML algorithms might produce misleading results due to the usage of standard loss functions (e.g., classification accuracy), which do not correctly measure the detection of faulty products. Industrial quality often involves big data, due to the volume and velocity of the produced data records, which increases the computational effort required by the ML algorithms

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 3	License type: cc-by

Similar Papers

Comparative analysis of resampling algorithms in the prediction of stroke diseases
Dauda Sani Abdullahi ... Dr Muhammad Sirajo Aliyu
UMYU Scientifica | VOL. 2
Dauda Sani Abdullahi, et. al.Dauda Sani Abdullahi ... Dr Muhammad Sirajo Aliyu
30 Mar 2023
UMYU Scientifica | VOL. 2

Applying machine learning methods to predict geology using soil sample geochemistry
Timothy C.C Lui ... Sharon A Cowling
Applied Computing and Geosciences | VOL. 16
Timothy C.C Lui, et. al.Timothy C.C Lui ... Sharon A Cowling
11 Aug 2022
Applied Computing and Geosciences | VOL. 16

Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods
Kerim Koc ... Asli Pelin Gurgun
Engineering, Construction and Architectural Management | VOL. 30
Kerim Koc, et. al.Kerim Koc ... Asli Pelin Gurgun
23 Jun 2022
Engineering, Construction and Architectural Management | VOL. 30

Application of Advanced Sampling Techniques to Handle Class Imbalance in GT Blade Failure Prediction
Rishabh Shrivastava ... Pavan Vodnala
-
Rishabh Shrivastava, et. al.Rishabh Shrivastava ... Pavan Vodnala
07 Dec 2023
07 Dec 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Abstract

Highlights

Summary

Talk to us

Similar Papers