_ This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 208137, “Reconstruction of Missing Segments in Well Data History Using Data Analytics,” by Yuanjun Li, SPE, and Roland Horne, SPE, Stanford University, and Ahmed Alshmakhy, SPE, ADNOC, et al. The paper has not been peer reviewed. _ The problem of missing data is a common one in well-production records. An incomplete data set is commonly simplified by omitting all observations with missing values, which can lead to significant information loss. In the complete paper, the authors developed an efficient procedure that enables fast reconstruction of the entire production data set with multiple missing sections in different variables. Ultimately, the complete information can support the reservoir history-matching process and production allocation and can develop models for reservoir performance prediction. Introduction Missing values are a prevalent issue in pressure transient analysis. Their occurrence is often unavoidable and is a serious problem because many data-mining algorithms cannot work with data sets that are missing values. Sources of missing field data include operational problems, changes in operating conditions, and variable sampling frequencies. Imputation Technique of Missing Values Case Deletion. If a missing data problem exists in a multivariate series, the time series may not be evaluated appropriately. In time-series data, each record is unique. Thus, deleting a record would result in a series with gaps, which would be unusable for many analyses. Furthermore, the time-series plot would be truncated and riddled with flaws. Thus, to conduct a time-series analysis, those missing observations must be estimated and reinserted. When the number of missing values in the data set is considerable, eliminating the associated cases might result in the loss of vital and valuable information. Researchers have found that case elimination is not appropriate when the missing rate is greater than 3%. Physics-Based Imputation. Any physics-driven diagnostic or model is based on fundamental assumptions that may or may not be correct in all situations but are intended to represent reservoir behavior. While data-driven models do not make any assumptions about underlying physics, they rely on training data. Therefore, data analytics and machine learning can be an ideal substitute. Local Statistical Approach. Local statistical methods include the K-nearest-neighbor (KNN) imputation and local-least-square imputation, two of the most well-known algorithms. To impute the incomplete data, the KNN imputation employs a pairwise correlation between the target gene with missing values and the K closest reference genes. For missing-value imputation, the techniques in this local method use local similarity patterns in the data sets or a specific column of data.
Read full abstract