The volume of data routinely acquired in many industrial geoscientific settings has increased to a point that it poses serious technical challenges to store and process. Optimal experimental design is a tool that can address some of these challenges. First, before data acquisition, optimal design can forecast optimum acquisition geometries and can thus be used to minimize the acquired data volume and reduce the cost of the data life cycle. Second, after data acquisition, optimal design can optimally select the best data to process, thus minimizing the data volume throughput in processing workflows. The catch is that optimal survey design is itself subject to the computational difficulties of big data. We have developed a parallelizable dimensionality reduction method, based on sparse rank-revealing QR-factorization to reduce computing and storage costs for optimal survey design. We implemented the distributable procedure for a guided Bayesian survey design and developed it as part of a workflow to optimize a marine seismic survey of more than 100 million source-receiver pairs in the Gulf of Mexico. The method improved computation times by more than an order of magnitude and reduced memory requirements by nearly two orders of magnitude. Although the procedure involves approximations in exchange for subsequent computational efficiency, the marine survey design results were identical to those achieved without compression. The distributable reduction method may be generally useful for matrix reduction problems in which the matrix data are distributed over multiple machines.
Read full abstract