In this study, four machine learning models are applied to estimate the water level variations recorded at three buoy stations during the 2022 Tonga tsunami. A new model performance evaluation metric, the lag degree, is introduced to compensate for the limitations of the conventional evaluation metrics (such as RMSE and R2 values), which could specify the lag extent, thus lag failure, between the estimated and original time series data. The long short-term memory (LSTM) and gated recurrent unit (GRU) models can accurately estimate the water level variations with less lag failure, superior to the multi-layer perceptron (MLP) and random forest (RF) models. Model estimation at the volcano-near area is less satisfactory (R2 < 0.5 after 3 time steps) than that at the volcano-far area (R2 > 0.5 within 8 time steps for LSTM and GRU) or the tsunami-shadowed area (R2 > 0.85 within 8 time steps). This is because, at the volcano-near area, both the low- and high-frequency components co-exist, resulting in sophisticated local water level fluctuations, whereas at the volcano-far area, only the low-frequency component prevails. For the tsunami-shadowed area, only specified frequency components could reach, thus relatively easy for model estimation.