With the development of technology, industrial monitoring systems can collect not only process data but also video data. Considering the temporal characteristics of video data as well as the correlation and inconsistency between video and process data, and in order to further improve the accuracy of industrial process fault diagnosis, a fault diagnosis method based on video recognition and multi-source heterogeneous information fusion is proposed. In this method, a Visual Geometry Group (VGG) using Adaptive Parameter Rectified Linear Unit (APReLU) embedded Video Vision Transformer (ViViT) is designed firstly to extract the information change of video. It uses the VGG to extract the local information, the Spatial Transformer Encoder to extract global information, and the Temporal Transformer Encoder to extract the temporal features of the sequential video frames. Then one-dimensional Residual Network (ResNet) is adopted to handle the correlated process variables. Since video data and process data reflect different information of industrial processes, Highway Network is used to process the concatenated fusion features of the video data and process data. The Highway Network can assign larger weights to important components, and smaller weights to unimportant components. Experiments show that the accuracy of industrial process fault diagnosis based on video recognition and multi-source data is greater than that based on single-source data.
Read full abstract