Accurate traffic congestion estimation and prediction are critical building blocks for smart trip planning and rerouting decisions in transportation systems. Over the decades, there have been many studies focusing on traffic congestion estimation and prediction with different statistical approaches (e.g., Markov chain) and machine learning models (e.g., clustering, Bayesian networks, and artificial neural networks). However, there is a lack of a unified framework to address the mechanisms of different models and integrate the advantages of different methods through combinations. This paper introduces the FD-Markov-LSTM model, a hybrid interpretable approach that combines the fundamental diagram (FD), Markov chain, and long short-term memory (LSTM). The aim is to estimate and predict traffic states by integrating statistical data in both congested and uncongested scenarios. The FD-Markov-LSTM model leverages the FD to identify hierarchical traffic states and utilizes the Markov process to capture the probabilistic transitions between these states. We employ the LSTM model to further capture the residual time series produced by the Markov chain model (assuming a memoryless property) to enhance the estimation and prediction performance. The proposed model's accuracy in estimating and predicting traffic flow is evaluated using empirical data from three case studies conducted in Beijing and Los Angeles. The results highlight a significant improvement in accuracy compared to classical benchmark models such as the Markov model, ARIMA model, k-Nearest Neighbor model, Random Forest model, and LSTM. Specifically, the FD-Markov-LSTM model achieves reductions of over 39% in mean absolute error, 35% in root mean squared error, and 7.4% in mean absolute percentage error. These results clearly demonstrate that the FD-Markov-LSTM model outperforms the benchmark models, enabling more precise predictions of traffic flow.