Two CO2 storage sites located in the western Norwegian North Sea (NNS), called Aurora and Smeaheia, are currently under construction and assessment respectively. In geological storage of CO2, the in situ minimum horizontal stress is an essential input parameter for assessment of both containment and induced seismic risks [1]. To infer the stress states at certain depths at a site where no data is available, the standard approach is to perform a classical linear regression on stress data versus depth and treat the fitted trend line as the best site-specific stress predictions along depth [2]. However, stress data are often highly limited at CO2 storage sites; for example, Aurora and Smeaheia have only five in situ stress measurements available at best respectively. Such limited data may severely underrepresent the true stress distribution at one site. Data scarcity coupled with measurement error and spatial variability poses a challenge to reliable stress prediction, and hence it is crucial to quantify and reduce uncertainty in site-specific stress prediction for CO2 storage. Stress uncertainty is actually the required input information in the more rational probabilistic risk assessment framework. A natural solution to reducing uncertainty is to integrate stress information from other sources.
 On the Norwegian continental shelf, extensive data has been accumulated from previous petroleum projects. Of the publicly available NPD stress database, Figure 1a shows the distribution of versus depth (< 3,000 m) for each site within the study area containing Aurora and Smeaheia, and reveals a certain degree of similarity between the stress trends at the 11 sites. Such similarity, aligned with other published results [2], may be attributed to the relaxed sedimentary basins where gravitational loading dominates the lateral stress distribution rather than tectonic components, with the between-site variation arising from differences in the geological conditions and pore pressures [3]. When facing limited data for a site like Aurora and Smeaheia, the current approach is often to either directly use the stress trend from other sites having richer data or expand the coverage area to include more data. Such semi-subjective information borrowing approach, although effective in many cases, may lead to overly confident stress predictions as it fails to account for possible between-site heterogeneity in stress trends.
 Bayesian inference has been widely used as a rigorous and powerful statistical approach for quantifying uncertainty, as well as combining information from different sources via informative prior distributions. Hence, historical stress data may be integrated into Bayesian analysis of site-specific data in the form of prior distributions, with stress uncertainties being quantified and updated as the posterior distributions [4, 5]. When developing prior distributions for site-specific stress prediction, it may be tempting to combine all historical stress data for a holistic Bayesian analysis, yet such complete pooling approach may give an overconfident summary of prior information in that it ignores the possible stress heterogeneity between sites.
 This paper presents a Bayesian hierarchical (i.e., partial pooling) model (BHM) that explicitly accounts for between-site heterogeneity/similarity when constructing prior distributions from historical stress data, and demonstrates how the proposed model effectively borrows historical information to reduce uncertainty in site-specific stress prediction for CO2 storage in the NNS study area. Figures 1b illustrates the prior predictions of versus depth at the Aurora site from the Bayesian complete and partial pooling models. Although the complete pooling model gives less uncertain stress predictions than the partial pooling model as indicated by the narrower 90% prediction intervals (PIs), it does not well capture the five unseen stress measurements at Aurora in that two out of five stress values fall outside the 90% PIs. This suggests that complete pooling analysis indeed gives overconfident prior distributions out of the NPD database, and is thus not suitable for integrating historical data into site-specific stress prediction for CO2 storage in the NNS. On the other hand, the partial pooling model gives fairly good prior predictions of the five unseen stress values at Aurora, albeit with larger uncertainties. This result demonstrates the effectiveness of BHM as a framework for formulating proper informative priors from historical data, and an encouraging implication is that probabilistic risk assessment is allowed even with no site-specific stress data at this storage site, which is not possible if external information is not integrated properly. Figure 1c shows the posterior stress predictions updated with the five site-specific stress values from the two Bayesian models in question. After incorporating the site-specific data, the complete pooling model still over-predicts the two stress values at depths with barely noticeable updating, while partial pooling gives considerably more accurate stress predictions with reduced uncertainty.
Read full abstract