This article, written by Special Publications Editor Adam Wilson, contains highlights of paper SPE 181015, “Natural-Language-Processing Techniques on Oil and Gas Drilling Data,” by M. Antoniak, J. Dalgliesh, SPE, and M. Verkruyse, Maana, and J. Lo, Chevron, prepared for the 2016 SPE Intelligent Energy International Conference and Exhibition, Aberdeen, 6–8 September. The paper has not been peer reviewed. Recent advances in search, machine learning, and natural-language processing have made it possible to extract structured information from free text, providing a new and largely untapped source of insight for well and reservoir planning. However, major challenges are involved in applying these techniques to data that are messy or that lack a labeled training set. This paper presents a method to compare the distribution of hypothesized and realized risks to oil wells described in two data sets that contain free-text descriptions of risks. Introduction In the oil and gas industry, risk identification and risk assessment are critical. This holds particularly true during the drilling stages, which cannot begin before a risk assessment is conducted. While these risk assessments are typically conducted in a group setting, the project drilling engineer usually has a predetermined list of risks and likelihood scores that are the focus of the conversation. One problem with this approach is that drilling engineers are inherently biased by personal experiences, which can affect their view on how likely an event is to happen. For example, if a project drilling engineer recently encountered well-control issues, the engineer will likely overestimate the chance of future well-control issues. On the other hand, if the engineer has never encountered a well-control issue, it may be unintentionally omitted altogether from the risk assessments. Using historical data as a barometer could help the drilling engineer overcome these issues, though doing so requires a unified view of both prior risk assessments and prior issues encountered. Chevron maintains both data sets in disparate systems. The Risk Assessment database contains descriptions of risks from historical risk assessments, and the Well Operations database contains descriptions of unexpected events and associated unexpected-event codes, which categorize the unexpected events. Leveraging both, a system has been created that allows a project drilling engineer to enter a risk in natural language, return drilling codes related to this risk, produce statistics showing how often these types of events have happened in the past, and predict the likelihood of the problem occurring in certain fields.