The application of remote sensing technology for water quality monitoring has attracted much attention recently. Remote sensing inversion in coastal waters with complex hydrodynamics for non-optically active parameters such as total nitrogen (TN) and total phosphorus (TP) remains a challenge. Existing studies build the relationships between remote sensing spectral data and TN/TP directly or indirectly via the mediation of optically active parameters (e.g., total suspended solids). Such models are often prone to overfitting, performing well with the training set but underperforming with the testing set, even though both datasets are from the same region. Using the Hong Kong coastal region as a case study, we address this issue by incorporating spatial covariates such as hydrometeorological and locational variables as additional input features for machine learning-based inversion models. The proposed model effectively alleviates overfitting while maintaining a decent level of accuracy (R2 exceeding 0.7) during the training, validation and testing steps. The gap between model R2 values in training and testing sets is controlled within 7%. A bootstrap uncertainty analysis shows significantly improved model performance as compared to the model with only remote sensing inputs. We further employ the Shapely Additive Explanations (SHAP) analysis to explore each input’s contribution to the model prediction, verifying the important role of hydrometeorological and locational variables. Our results provide a new perspective for the development of remote sensing inversion models for TN and TP in similar coastal waters.