Accurate machine learning predictions of passenger flow data for mass rapid transit (MRT) systems can considerably improve operational efficiency by enabling better allocation of train and human resources. However, such predictions are challenging because MRT networks have complex structures with route dependence and transfer stations. Although the static state of an MRT network has been computed in previous studies, a comprehensive understanding of an MRT network requires characterizing its dynamics. Therefore, this paper proposes a dynamic traffic network representation (DTNR) model that captures station features from historical traffic flows and geographical information of MRT stations. Furthermore, a multilevel attention network (MLAN) model is proposed to predict MRT passenger flow as a downstream task following the pretraining of the DTNR model. The experimental results of this study indicate that the developed DTNR and MLAN models can accurately predict MRT passenger flow. These models are widely applicable to different MRT systems and passenger flow situations, making them a valuable tool for transportation planners and operators.