In this paper, the growing significance of data analysis in manufacturing environments is exemplified through a review of relevant literature and a generic framework to aid the ease of adoption of regression-based supervised learning in manufacturing environments. To validate the practicality of the framework, several regression learning techniques are applied to an open-source multi-stage continuous-flow manufacturing process data set to typify inference-driven decision-making that informs the selection of regression learning methods for adoption in real-world manufacturing environments. The investigated regression learning techniques are evaluated in terms of their training time, prediction speed, predictive accuracy (R-squared value), and mean squared error. In terms of training time (TT), k-NN20 (k-Nearest Neighbour with 20 neighbors) ranks first with average and median values of 4.8 ms and 4.9 ms, and 4.2 ms and 4.3 ms, respectively, for the first stage and second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, respectively, over 50 independent runs. In terms of prediction speed (PS), DTR (decision tree regressor) ranks first with average and median values of 5.6784×106 observations per second (ob/s) and 4.8691×106 observations per second (ob/s), and 4.9929×106 observations per second (ob/s) and 5.8806×106 observations per second (ob/s), respectively, for the first stage and second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, respectively, over 50 independent runs. In terms of R-squared value (R2), BR (bagging regressor) ranks first with average and median values of 0.728 and 0.728, respectively, over 50 independent runs, for the first stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, and RFR (random forest regressor) ranks first with average and median values of 0.746 and 0.746, respectively, over 50 independent runs, for the second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process. In terms of mean squared error (MSE), BR (bagging regressor) ranks first with average and median values of 2.7 and 2.7, respectively, over 50 independent runs, for the first stage of the predictive modeling of the multi-stage continuous-flow manufacturing process, and RFR (random forest regressor) ranks first with average and median values of 3.5 and 3.5, respectively, over 50 independent runs, for the second stage of the predictive modeling of the multi-stage continuous-flow manufacturing process. All methods are further ranked inferentially using the statistics of their performance metrics to identify the best method(s) for the first and second stages of the predictive modeling of the multi-stage continuous-flow manufacturing process. A Wilcoxon rank sum test is then used to statistically verify the inference-based rankings. DTR and k-NN20 have been identified as the most suitable regression learning techniques given the multi-stage continuous-flow manufacturing process data used for experimentation.
Read full abstract