High-fidelity physically based groundwater flow and solute transport models have been limited for seawater intrusion remediation design because of computationally intensive evolutionary algorithms. Data-driven machine learning approaches are promising to substitute expensive-cost groundwater numerical models within optimization due to computing efficiency. However, machine learning surrogates may accumulate error of forecasting and thereby result in infeasible optimal solutions. To achieve desired fidelity level, this study proposes a novel adaptive machine learning surrogate based multiobjective optimization method for coastal aquifer desalination. An adaptive modeling algorithm is newly introduced to iteratively retrain poorly-performed machine learning models and enhance predicting accuracy. The method is demonstrated to seek optimal extraction and injection strategies for scavenging residual saltwater trapped in an upstream aquifer behind a subsurface dam. Two conflicting objectives of minimizing the total extraction-injection rate and maximizing saltwater removal effectiveness are considered. Three machine learning models including artificial neural network (ANN), Gaussian process (GP) and response surface regression model (RSR) are developed to replace a high-fidelity seawater intrusion model for predicting chloride concentration and salinity mass. Non-dominated Sorting Genetic Algorithm II (NSGA-II) is employed to derive Pareto fronts. Pareto optimal solutions obtained from machine learning models are compared against those from the seawater intrusion model. Results indicate that the developed machine learning models do not only have strong predicting capability, but also maintain good quality of Pareto optimal solutions, while achieving substantial computational saving up to 95%. Especially, the adaptively retrained RSR model inhibits error accumulation of forecasting and accelerates correct convergence to the true Pareto front. The proposed method is found superior performance in accurate convergence, widely-spread diversity as well as high computing efficiency than conventional methods.