Abstract

The Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets include millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.

Highlights

  • The Industry 4.0 concept is increasing the pressure of companies to adopt data intelligence processes in order to remain competitive in the markets they operate [12]

  • Since five runs are applied for each dataset, it is computationally costly to apply all balancing techniques and Machine Learning (ML) algorithms to all products

  • The Industry 4.0 revolution is transforming manufacturing companies, which are increasingly adopting data intelligence processes in order to remain competitive in the market

Read more

Summary

Introduction

The Industry 4.0 concept is increasing the pressure of companies to adopt data intelligence processes in order to remain competitive in the markets they operate [12]. Quality control is a crucial manufacturing process that can directly impact on productivity by reducing costs, defective products and complaints, among others [16]. In 2016 there was a Kaggle challenge that addressed an industrial manufacturing quality prediction by using ML approaches [9, 12, 16]. Industrial quality ML prediction is addressed as a binary classification task, which is often a nontrivial task for two main reasons. There can be more than 99% of normal cases Under such extreme unbalanced distribution, ML algorithms might produce misleading results due to the usage of standard loss functions (e.g., classification accuracy), which do not correctly measure the detection of faulty products. Industrial quality often involves big data, due to the volume and velocity of the produced data records, which increases the computational effort required by the ML algorithms

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.