_ This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 213869, “Water Digital Avatar—Where Chemistry Is Mixed With Machine Learning,” by Jesse Farrell, SPE, and Sergey Makarychev-Mikhailov, SPE, SLB. The paper has not been peer reviewed. _ Water affects almost every operation in the exploration and production industry. Until now, time-intensive laboratory tests or cumbersome third-party simulators were required to extract physicochemical properties. In the complete paper, a family of machine-learning-based reduced-order models (ROMs) trained on rigorous first-principle thermodynamic simulation results is presented. The developed ROMs that predict water properties enable automated decision-making and improve water-management work flows. The presented approach can be extended to other oilfield, chemical, and chemical-engineering applications. Introduction The properties of the water phase and all produced fluids directly influence flow assurance, three-phase flow pressure/volume/temperature modeling, and fluid-compatibility aspects across the full life cycle of the well, including during well construction, stimulation, and production operations. Modeling these systems enables one to predict, mitigate, and, in some cases, completely prevent deleterious effects in tubing, the reservoir, and near-wellbore regions. First-principle thermodynamic simulations are often considered to be ground truth in the oilfield industry and are widely used in place of time-consuming laboratory experiments. The use of rigorous thermodynamic software, however, is not always practical. In certain cases, commercial and open-source simulators are overloaded with functionality unnecessary for a task, require training of personnel, and are often difficult to incorporate into digital work flows in the cloud or to deploy at the edge on surface equipment or the downhole tools available to model changing conditions in real time. This is where ROMs find their application because they are fast, reasonably accurate, and highly customizable solutions. The authors’ scientific hypothesis was that machine-learning-based ROMs can be used to quantify the physicochemical properties and scaling tendencies of oilfield waters in place of rigorous, first-principle thermodynamic models. Methods The United States Geological Survey (USGS) Produced Waters Geochemical Database was used as the initial data source for this study. The original database contains approximately 115,000 produced-water and other deep-formation water samples collected and characterized in the United States in the past 120 years. Although the USGS database is an excellent source, the data still required cleaning and enrichment. For the first step, inconsistent, poorly populated, and outlier samples were removed, which led to noticeable data attrition and resulted in approximately 85,000 “clean” records. The enrichment phase included several manipulations and alterations of the original data. The resulting data set is not always accurate regarding individual samples but is much better populated than the original one in terms of minor ion concentrations, and is believed to be representative on a large scale.