Data-driven methods have been revolutionizing the way physicists and engineers handle complex and challenging problems even when the physics is not fully understood. However, these models very often lack interpretability. Physics-aware machine learning (ML) techniques have been used to endow proxy models with features closely related to the ones encountered in nature; examples span from material balance to conservation laws. In this study, we proposed a hybrid-based approach that incorporates physical constraints (physics-based) and yet is driven by input/output data (data-driven), leading to fast, reliable, and interpretable reservoir simulation models. To this end, we built on a recently developed deep learning–based reduced-order modeling framework by adding a new step related to information on the input–output behavior (e.g., well rates) of the reservoir and not only the states (e.g., pressure and saturation) matching. A deep-neural network (DNN) architecture is used to predict the state variables evolution after training an autoencoder coupled with a control system approach (Embed to Control—E2C) along with the addition of some physical components (loss functions) to the neural network training procedure. Here, we extend this idea by adding the simulation model output, for example, well bottom-hole pressure and well flow rates, as data to be used in the training procedure. Additionally, we introduce a new architecture to the E2C transition model by adding a new neural network component to handle the connections between state variables and model outputs. By doing this, it is possible to estimate the evolution in time of both the state and output variables simultaneously. Such a non-intrusive data-driven method does not need to have access to the reservoir simulation internal structure, so it can be easily applied to commercial reservoir simulators. The proposed method is applied to an oil–water model with heterogeneous permeability, including four injectors and five producer wells. We used 300 sampled well control sets to train the autoencoder and another set to validate the obtained autoencoder parameters. We show our proxy’s accuracy and robustness by running two different neural network architectures (propositions 2 and 3), and we compare our results with the original E2C framework developed for reservoir simulation.