Accurate prediction of a roller's remaining useful life (RUL) is significant for a hot strip mill to avoid major safety accidents and substantial economic losses. Since rollers’ degradation processes represent multi-stage characteristics, traditional single-stage models cannot acquire accurate RUL prediction results. Therefore, this paper proposes an adaptive staged RUL prediction method based on multi-scale long short-term memory network with multi-head self-attention (LSTM-MHA). The roller's production data and operation data are fused to construct interpretable health indicators (HI), which can represent the roller's remaining rotatable angle. Then the degradation process is divided into multiple stages and analyzed. The proposed multi-scale LSTM-MHA can adaptively update the model weights based on changes in the roller degradation stages by the multi-scale memory structure. The MHA embedding mechanism filters important temporal information for LSTM units. The trained model can predict the roller's RUL by inputting the current roller's HI. The proposed method is verified on an industrial hot strip mill roller dataset from a well-known steel company. The validation results show that the prediction accuracy of the proposed method is higher than 98.98%. Compared with existing deep learn-based methods, the proposed multi-scale LSTM-MHA method has significant advantages in roller's RUL prediction.