This article, written by Special Publications Editor Adam Wilson, contains highlights of paper SPE 190087, “Unsupervised Statistical Learning With Integrated Pattern-Based Geostatistical Simulation,” by Q. Li and R. Aguilera, SPE, University of Calgary, prepared for the 2018 SPE Western Regional Meeting, Garden Grove, California, USA, 22–27 April. The paper has not been peer reviewed. This paper presents a new geostatistics modeling methodology that connects geostatistics and machine-learning methodologies, uses nonlinear topological mapping to reduce the original high-dimensional data space, and uses unsupervised-learning algorithms to bypass problems with supervised-learning algorithms. The algorithm presented is a neural topology-preserving pattern-based geostatistical simulation algorithm that integrates the self-organizing map (SOM) concept and its updated version—growing self-organizing map (GSOM)—with an unsupervised competitive learning structure. Introduction In oil and gas reservoir modeling, any model construction faces challenges of limited data to some extent. The heuristic behind all geostatistical techniques is the implicit existence of statistical relationships among available data. “Data” here is a broad term; it could be discrete points, such as porosity or permeability at certain locations, but it also could be training images (TIs), which are used in this work. Using TIs as input data originated with multiple-point geostatistics. The aim was to overcome the limitations of using traditional two-point statistical variograms to describe geological continuity, especially in the case of curvilinear structures, which are quite common in nature, such as in fracture networks and geological fluvial structures. The authors write that geostatistics could benefit from the machine-learning or statistical-learning areas. Machine-learning tasks can be divided into two protocols, supervised learning and unsupervised learning. The difference depends on whether the input data have correct labels or not. For the investigation considered in this paper, after retrieving image patches from TIs, machine-learning algorithms were used to cluster those image patches into different classifications. If the correct clusters to which the image patches belong is known beforehand, the data can be said to have correct labels. Further, a large amount of these labeled data could be used for a model to learn. Supervised learning occurs when the model is finely tuned by guidance of these correct labels using an error-correction process. Unsupervised learning, on the other hand, occurs when neither how many clusters exist nor the correct clusters to which the image patches belong is known. Thus, without correct labels for guidance, this problem is identified as an unsupervised-learning problem. This is the first major drawback of applying machine-learning algorithms to geostatistical simulations. Two other important issues are encountered when performing pattern-based geostatistical simulations. The first is related to the large number of image patches. This large image-patch database contains high pattern redundancy, which could make pattern similarity comparison inefficient. The second issue is the high dimensionality of each image patch, which could typically be described by low-dimensional nonlinear-structure manifolds. To model and visualize these high-dimensional data, a nonlinearity dimensionality reduction should be sought.
Read full abstract