The emergence of cloud computing, industrial internet of things (IIoT), and new machine learning techniques have shown the potential to advance prognostics and health management (PHM) in smart manufacturing. While model-based PHM techniques provide insight into the progression of faults in mechanical components, certain assumptions on the underlying physical mechanisms for fault development are required to develop predictive models. In situations where there is a lack of adequate prior knowledge of the underlying physics, data-driven PHM techniques have been increasingly applied in the field of smart manufacturing. One of the limitations of current data-driven methods is that large volumes of training data are required to make accurate predictions. Consequently, computational efficiency remains a primary challenge, especially when large volumes of sensor-generated data need to be processed in real-time applications. The objective of this research is to introduce a cloud-based parallel machine learning algorithm that is capable of training large-scale predictive models more efficiently. The random forests (RFs) algorithm is parallelized using the MapReduce data processing scheme. The MapReduce-based parallel random forests (PRFs) algorithm is implemented on a scalable cloud computing system with varying combinations of processors and memories. The effectiveness of this new method is demonstrated using condition monitoring data collected from milling experiments. By implementing RFs in parallel on the cloud, a significant increase in the processing speed (14.7 times in terms of increase in training time) has been achieved, with a high prediction accuracy of tool wear (eight times in terms of reduction in mean squared error (MSE)).
Read full abstract