Production data quality is a topic of general interest in the oil industry. It is often the only information available in sufficient amount in mature fields. With the development of information systems the oil companies have large volumes of data, but not all are reliable. The errors may have various origins, starting with problems in data acquisition systems. The existence of contaminated data may cause operational failures and lead to an inappropriate decision making process. A methodology for identification of contaminated data was applied in order to determine the quality of a production dataset of an oil field. To achieve this objective, a methodology based on data mining techniques, combining a fuzzy classification algorithm, neural network modeling and an iterative process, was applied to a real case, a database of an offshore field en México and the result was the classification of data: good, slightly contaminated or bad. The decline behavior of a well was evaluated, with good and slightly contaminated data and the results were appropriate. We concluded that this classification methodology based on intelligent algorithms generated a simple solution to the problem of quality determination of production data. We found that this methodology can be applied to any dataset. Of course, there is a degree of subjectivity in the methodology, and changing the restriction criteria for classification, the data quality determination may change. Furthermore, during the application of the proposed methodology, it was shown the effectiveness of data mining tools in the estimation of missing data.
Read full abstract