The structure of the rolling mill system is complex and the operating conditions are changeable. Therefore, the interdependence between the data needs to be fully considered in the fault diagnosis of the rolling mill. Although graph neural network (GNN) is a powerful architecture based on non-Euclidean spatial data, the current method is difficult to represent the long-range dependence of rolling mill fault vibration signals. Simply increasing the depth of GNN is not enough to expand the receptive field of the model, because the larger GNN model may have the problem of gradient disappearance or transition smoothing. In order to solve the above problems, an improved graph neural network based on Graph-Transformer is proposed to diagnose the health status of rolling mill. This method first performs sliding maximum sampling on the spectrum of the original vibration signal to improve the frequency resolution and reduce the feature dimension. Second, the relationship between fault features is characterized by constructing affinity graph. Finally, the long-range dependency between paired features is learned through the readout module and the self-attention mechanism in Graph-Transformer and the diagnostic results are output by the classifier. The experimental results on the rolling mill platform show that this method can not only adapt to the changing working conditions of the rolling mill but also achieve excellent performance in the case of sample imbalance and strong noise.