Forecasting urban traffic states is crucial to transportation network monitoring and management, playing an important role in the decision-making process. Despite the substantial progress that has been made in developing accurate, efficient, and reliable algorithms for traffic forecasting, most existing approaches fail to handle sparsity, high-dimensionality, and nonstationarity in traffic time series and seldom consider the temporal dependence between traffic states. To address these issues, this work presents a Hankel temporal matrix factorization (HTMF) model using the Hankel matrix in the lower dimensional spaces under a matrix factorization framework. In particular, we consider an alternating minimization scheme to optimize the factor matrices in matrix factorization and the Hankel matrix in the lower dimensional spaces simultaneously. To perform traffic state forecasting, we introduce two efficient estimation processes on real-time incremental data, including an online imputation (i.e., reconstruct missing values) and an online forecasting (i.e., estimate future data points). Through extensive experiments on the real-world Uber movement speed data set in Seattle, Washington, we empirically demonstrate the superior forecasting performance of HTMF over several baseline models and highlight the advantages of HTMF for addressing sparsity, nonstationarity, and short time series. History: Accepted by Ram Ramesh, Area Editor for Data Science & Machine Learning. Funding: This research was supported by the Institute for Data Valorisation, the Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation, the National Natural Science Foundation of China [Grants 12371456, 72101049, 72232001], the Sichuan Science and Technology Program [Grant 2024NSFJQ0038], and the Fundamental Research Funds for the Central Universities [Grant DUT23RC(3)045].