In order to capture and integrate the structural features and temporal features contained in social graph and diffusion cascade more effectively, an information diffusion prediction model based on the Transformer and Relational Graph Convolutional Network (TRGCN) is proposed. Firstly, a dynamic heterogeneous graph composed of the social network graph and the diffusion cascade graph was constructed, and it was input into the Relational Graph Convolutional Network (RGCN) to extract the structural features of each node. Secondly, the time embedding of each node was reencoded using Bi-directional Long Short-Term Memory (Bi-LSTM). The time decay function was introduced to give different weights to nodes at different time positions, so as to obtain the temporal features of nodes. Finally, structural features and temporal features were input into Transformer and then merged. The spatiotemporal features are obtained for information diffusion prediction. The experimental results on three real datasets of X (formerly known as Twitter), Douban, and Memetracker show that compared with the optimal model in the comparison experiment, the TRGCN model has an average increase of 4.16% in Hits@100 metric and 13.26% in map@100 metric. The validity and rationality of the model are proved.