In the rapid development of public transportation led, the traffic flow prediction has become one of the most crucial issues, especially estimating the number of passengers using the Mass Rapid Transit (MRT) system. In general, predicting the passenger flow of traffic is a time-series problem that requires external information to improve accuracy. Because many MRT passengers take cars or buses to MRT stations, this study used external information from vehicle detection (VD) devices to improve the prediction of passenger flow. This study proposed a deep learning architecture, called a multiple-attention deep neural network (MADNN) model, based on historical MRT passenger flow and the flow from surrounding VD devices that estimates the weights of the vehicle detection devices. The model consists of (1) an MRT attention layer (MRT-AL) that generate hidden features for MRT stations, (2) a surrounding VD (SVD) attention layer (SVD-AL) that generate hidden features for SVD devices, and (3) an MRT-SVD attention layer (MRT-SVD-AL) that generate attention weights for each VD device in an MRT station. The results of the investigation indicated that the MADNN model outperformed the models without multiple-attention mechanisms in predicting the passenger flow of MRT traffic.