Summary Numerical optimization is an integral part of many history-matching (HM) workflows. However, the optimization performance can be affected negatively by the numerical noise existent in the forward models when the gradients are estimated numerically. As an unavoidable part of reservoir simulation, numerical noise refers to the error caused by the incomplete convergence of linear or nonlinear solvers or truncation errors caused by different timestep cuts. More precisely, the allowed solver tolerances and allowed changes of pressure and saturation imply that simulation results no longer smoothly change with changing model parameters. For HM with linear-distributed Gaussian-Newton (L-DGN), caused by the discontinuity of simulation results, the sensitivity matrix computed by linear interpolation might be less accurate, which might result in slow convergence or, even worse, failure of convergence. Recently, we have developed an HM workflow by integrating the support-vector regression (SVR) with the distributed-Gaussian-Newton (DGN) method optimization method referred to as SVR-DGN. Unlike L-DGN that computes the sensitivity matrix with a simple linear proxy, SVR-DGN computes the sensitivity matrix by taking the gradient of the SVR proxies. In this paper, we provide theoretical analysis and case studies to show that SVR-DGN can compute a more-accurate sensitivity matrix than L-DGN, and SVR-DGN is insensitive to the negative influence of numerical noise. We also propose a cost-saving training procedure by replacing bad-training points, which correspond to relatively large values of the objective function, with those training-data points (simulation data) that have smaller values of the objective function and are generated at most-recent iterations for training the SVR proxies. Both the L-DGN approach and the newly proposed SVR-DGN approach are tested first with a 2D toy problem to show the effect of numerical noise on their convergence performance. We find that their performance is comparable when the toy problem is free of numerical noise. As the numerical-noise level increases, the performance of the L-DGN degrades sharply. By contrast, the SVR-DGN performance is quite stable. Then, both methods are tested using a real-field HM example. The convergence performance of the SVR-DGN is quite robust for both the tight and loose numerical settings, whereas the performance of the L-DGN degrades significantly when loose numerical settings are applied.