The biogenic information of different components in crude oils with varying degrees of biodegradation is partially obscured, making traditional molecular indicators ineffective in determining the contributions of mixed sources. Previous studies have mainly focused on calculating mixed source proportion for normal oils, with few investigations involving biodegraded mixed oils. Developing a quantitative method for the mixed source contribution of biodegraded oils using machine learning (ML) algorithms is a significant attempt both to the exploration practice and theoretical progress. Taking the biomarker parameters resistant to biodegradation of 150 oil samples as input datasets, a new method called Pareto-Optimized NMF (PONMF) was proposed in this study, which combines multi-objective optimization (Pareto optimization) with non-negative matrix factorization (NMF) to analyze the relationship between biodegraded oils in the absence of source rock samples or original end-member (EM) oils while satisfying non-negativity constraints and additional constraint conditions. This method decomposes the matrix of crude oil samples into multiple end-members, each representing a possible input from a hydrocarbon source rock. Analysis of the end-member characteristics facilitates the deduction of potential hydrocarbon source layers and the traits of their depositional environments. This examination is supported by data derived from 15 prospective hydrocarbon source rock samples, which were employed to ascertain the actual geological significance of the end-members. The chord diagram depicts the visualization results of the overall relative contribution of source rock end members to crude oil in each area after the PONMF calculation results. The relative contribution of source rocks in different areas to the investigated crude oil samples is proportionate, and the arc length can reflect the contribution of the three sets of source rocks to a certain area. The findings indicate that the oils extracted from the stratified reservoirs in Eastern Chepaizi are primarily derived from the Permian source strata within the Shawan Depression, with a significant contribution ranging from 30% to 90% (EM2), and a lesser extent from the Carboniferous sources, contributing between 5% and 30% (EM1). Conversely, the provenance of oil in Western Chepaizi is overwhelmingly attributed to the Jurassic source interval located in the Sikeshu Depression, where EM3 accounts for over 85% of the source. Meanwhile, the oils in the Central Zone exhibit a composite origin, intermingling contributions from both EM1 and EM3. The advantage of this method is that it does not require direct access to hydrocarbon source rock samples or end-member oils but infers potential hydrocarbon source layers through the existing mixed-source oil samples. The PONMF method provides an effective tool for identifying the sources of biodegraded oils and can be used to study potential hydrocarbon source rock analysis for oils that have undergone biodegradation after hydrocarbon accumulation.
Read full abstract