Abstract

The main aim of this study is to predict asphaltene stability in crude oil as a function of its SARA (Saturates, Aromatics, Resins and Asphaltenes) values using Machine learning algorithms. For this purpose, four Machine Learning algorithms namely, Linear discriminant analysis (LDA), Linear regression (LR), decision tree (DT), and random forest (RF) were applied to predict the asphaltene stability of 95 crude oils. SARA values of these oil samples were taken as input parameters for the algorithms. The dataset was also visualized comprehensively using different plots. For ML model implementation, 70% of the dataset was used during the training of the model while the remaining 30% dataset was used during the testing phase. For the model's reliability, a 10-fold cross-validation method was applied. Moreover, Recursive feature elimination (RFE) was also applied to determine the effect of feature elimination on the accuracy of models. The performance of all ML algorithms was determined using various statistical metrics. For all ML models, the accuracy was found in the range of 60–80% during the training and testing phase with LDA having the highest accuracy of about 78% and DT with the lowest, i.e., 62% in the training phase. On the other hand, DT accuracy was found to be the highest whereas LDA accuracy was found to be the lowest during the testing phase. The overall accuracy of LDA was observed to be the highest. The Resins feature was found to be the most important parameter for predicting the asphaltene stability in crude oils by all ML algorithms. Moreover, for models robustness, all ML models were run at two different training-to-testing ratios i.e. 80:20 and 60:40. At both ratios, all ML models generally yielded accuracy in the range of 0.6 and 0.8 during training and testing phases similar to the base case i.e. 70:30. Finally, the moderate accuracy of ML models and overall outcome of the study suggests that SARA values alone are not capable enough to determine the asphaltene stability in crude oil with reliability and more features need to be identified for incorporation in the model to improve the accuracy of models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call