Prognostics and Health Management (PHM) is identified as an important lever for enhancing the development of predictive maintenance to ensure the reliability, availability, and safety of industrial systems. However, the efficiency of data- driven PHM approaches is dependent on the quality and quantity of data. Therefore, exploiting multiple data sources can provide additional, useful information than single-modal data. For instance, by incorporating multiple data sources, including condition monitoring data, images from cameras, and texts from maintenance technicians’ reports, multi-modal learning can provide a more comprehensive and accurate understanding of the system’s health. However, multi-modal deep learning is complex to understand. To address this complexity, it is crucial to incorporate explainable artificial intelligent techniques to provide clear and interpretable insights into how the model makes decisions. In this light, this paper proposes the application of the model-agnostic-explanation approach, i.e., SHAP, to explain the working mechanism of multimodal learning for the prediction of industrial steam generator degradation. Particularly, we determine the important features of each data modality and investigate how multimodal learning can overcome the issues of low-quality data from a single modality due to the additional information from other data modalities.