Abstract

Abstract In recent years, big data and relevant analytics technologies have become increasingly popular in E&P industry. Numerous tools and techniques such as Hadoop, HBase, Hive and Mahout have been created to store, administrate and analyze big data with a variety of formats. By employing big data mining method on E&P data, information can be extracted and knowledge can be discovered from large volumes of unstructured raw data. Currently, the volume of unstructured data is estimated to be more than ~80% of the total big data volume. The objective of this paper is to demonstrate how to extract information and discover knowledge from huge amount of unstructured text data with lower-density value by Hadoop technology. As an example to illustrate the whole procedure, more than 10,000 public SPE abstracts on the topic of heavy oil fields' development and production are collected, pre-processed and stored in Hadoop distributed file system, from which some key parameters like field name, country, EOR methods, permeability and porosity of heavy oil fields are retrieved by pattern match and statistical models. The extracted values from the Hadoop then cleaned and analyzed. Based on the extracted information, knowledge such as the top ten heavy oil fields according to their occurrences, geographical distribution of heavy oil fields, EOR methods applied in heavy oil fields, EOR history charts, porosity distribution and crossplot of permeability and porosity of heavy oil fields can be easily presented and visualized. In order to validate the effectiveness and accuracy of the knowledge discovering process, the extracted values are compared with those from manual constructed system, illustrating the information retrieving and knowledge mining processes based on Hadoop big data technology are capable of retrieving text and numeric values from unstructured raw data in E&P industry and providing similar statistical trends with the manual system. Introduction The oil and gas industry has accumulated huge amount of exploration and production data with various formats. Moreover, the data incremental speed is accelerating due to more data acquired from smart fields, real-time sensors, 2D, 3D and 4D time-lapsed seismic and complicated reservoir modeling and simulations, which have been widely used in E&P industry for a few years. It was reported that the Chevron's internal IT traffic alone exceeded 1.5 terabytes a day in 2011 (Feblowitz, 2012). In addition, the proportion of the unstructured format among the big data is increasing. How to store, analyze and retrieve values and knowledge from big data is a challenge that the oil industry has to confront. When the data volume increases to hundreds of terabytes, the conventional database and data analytics methods usually fail to deal with considering the time and cost they will take.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.