Evaluation of various machine learning-based bias correction approaches for NASA POWER air temperatures: a case study of Nigeria
ABSTRACT Remotely sensed air temperature data from NASA POWER are widely used in regions with scarce climatic observations, particularly for agricultural applications such as calculating crop water requirements. This study employed a suite of machine learning (ML) algorithms to correct biases in NASA POWER air temperature outputs, including multiple support vector regression (SVR) variants—Linear SVR, Quadratic SVR, Cubic SVR, Fine Gaussian SVR, Medium Gaussian SVR, Coarse Gaussian SVR—and ensemble decision tree models: bagged trees (BGT) and boosted trees (BT). The objective of this study was to assess the ability of different ML algorithms to reduce biases in NASA POWER air temperature data, with the broader goal of identifying the most suitable ML method for air temperature bias correction in Nigeria. For this analysis, we used daily air temperature records from seven meteorological stations across diverse regions of Nigeria. The performance of NASA POWER minimum and maximum air temperature datasets was evaluated using standard error metrics. Subsequent application of ML algorithms significantly improved data accuracy: the normalized root mean square error (NRMSE) of the corrected outputs was mostly below 10%, indicating excellent predictive performance when ML was integrated. Among the SVR variants tested, Fine Gaussian SVR consistently yielded the best prediction results. This finding suggests that Fine Gaussian SVR is a robust tool for enhancing the reliability of air temperature data—critical for improving the accuracy of crop water requirement calculations in regions where in-situ air temperature observations are limited.
- Research Article
24
- 10.1016/j.catena.2022.106404
- May 28, 2022
- CATENA
Machine learning for cation exchange capacity prediction in different land uses
- Research Article
24
- 10.1016/j.isprsjprs.2023.05.015
- May 24, 2023
- ISPRS Journal of Photogrammetry and Remote Sensing
Utilization of synthetic minority oversampling technique for improving potato yield prediction using remote sensing data and machine learning algorithms with small sample size of yield data
- Research Article
9
- 10.1016/j.trf.2023.08.007
- Sep 15, 2023
- Transportation Research Part F: Psychology and Behaviour
Driver drowsiness modeling based on spatial factors and electroencephalography using machine learning methods: A simulator study
- Research Article
17
- 10.1016/j.compag.2022.107457
- Nov 3, 2022
- Computers and Electronics in Agriculture
Machine learning-based cloud computing improved wheat yield simulation in arid regions
- Research Article
6
- 10.1186/s40537-024-00991-w
- Sep 18, 2024
- Journal of Big Data
In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the Adjusted R2 score, Normalized Root Mean Square Error (NRMSE), Mean Absolute Deviation (MAD), Mean Absolute Error (MAE) and Mean Square Error (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with Adjusted R2 score value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with Adjusted R2 score value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.
- Research Article
- 10.1088/1755-1315/1170/1/012004
- Apr 1, 2023
- IOP Conference Series: Earth and Environmental Science
Application of machine learning algorithms in simulating crop yield has attracted more attention from plenty of scientists in recent years. The objective of this study is to estimate the coffee yields in Dak Lak province by using three machine learning algorithms, namely, artificial neural network (ANN), support vector regression (SVR), and random forest (RF), respectively. Input data in simulating processes includes maximum and minimum temperature, effective rainfall, reference evapotranspiration, and crop water requirement in the period 2000-2020. In which, the percentage of data in the training and testing phases is 70% and 30%, respectively. The results indicated that three machine learning models (i.e., SVR, ANN, and RF) have reasonable performance in simulating the coffee yield, out of which, the RF model performs best with NSE values of approximately 0.918 for the training phase and 0.818 for the testing phase.
- Research Article
20
- 10.3390/rs11232847
- Nov 29, 2019
- Remote Sensing
Surface shortwave net radiation (SSNR) flux is essential for the determination of the radiation energy balance between the atmosphere and the Earth’s surface. The satellite-derived intermediate SSNR data are strongly needed to bridge the gap between existing coarse-resolution SSNR products and point-based measurements. In this study, four different machine learning (ML) algorithms were tested to estimate the SSNR from the Landsat Thematic Mapper (TM)/ Enhanced Thematic Mapper Plus (ETM+) top-of-atmosphere (TOA) reflectance and other ancillary information (i.e., clearness index, water vapor) at instantaneous and daily scales under all sky conditions. The four ML algorithms include the multivariate adaptive regression splines (MARS), backpropagation neural network (BPNN), support vector regression (SVR), and gradient boosting regression tree (GBRT). Collected in-situ measurements were used to train the global model (using all data) and the conditional models (in which all data were divided into subsets and the models were fitted separately). The validation results indicated that the GBRT-based global model (GGM) performs the best at both the instantaneous and daily scales. For example, the GGM based on the TM data yielded a coefficient of determination value (R2) of 0.88 and 0.94, an average root mean square error (RMSE) of 73.23 W∙m-2 (15.09%) and 18.76 W·m-2 (11.2%), and a bias of 0.64 W·m-2 and –1.74 W·m-2 for instantaneous and daily SSNR, respectively. Compared to the Global LAnd Surface Satellite (GLASS) daily SSNR product, the daily TM-SSNR showed a very similar spatial distribution but with more details. Further analysis also demonstrated the robustness of the GGM for various land cover types, elevation, general atmospheric conditions, and seasons
- Research Article
2
- 10.1017/s1049023x24000414
- May 17, 2024
- Prehospital and disaster medicine
The aim of this study was to summarize the literature on the applications of machine learning (ML) and their performance in Emergency Medical Services (EMS). Four relevant electronic databases were searched (from inception through January 2024) for all original studies that employed EMS-guided ML algorithms to enhance the clinical and operational performance of EMS. Two reviewers screened the retrieved studies and extracted relevant data from the included studies. The characteristics of included studies, employed ML algorithms, and their performance were quantitively described across primary domains and subdomains. This review included a total of 164 studies published from 2005 through 2024. Of those, 125 were clinical domain focused and 39 were operational. The characteristics of ML algorithms such as sample size, number and type of input features, and performance varied between and within domains and subdomains of applications. Clinical applications of ML algorithms involved triage or diagnosis classification (n = 62), treatment prediction (n = 12), or clinical outcome prediction (n = 50), mainly for out-of-hospital cardiac arrest/OHCA (n = 62), cardiovascular diseases/CVDs (n = 19), and trauma (n = 24). The performance of these ML algorithms varied, with a median area under the receiver operating characteristic curve (AUC) of 85.6%, accuracy of 88.1%, sensitivity of 86.05%, and specificity of 86.5%. Within the operational studies, the operational task of most ML algorithms was ambulance allocation (n = 21), followed by ambulance detection (n = 5), ambulance deployment (n = 5), route optimization (n = 5), and quality assurance (n = 3). The performance of all operational ML algorithms varied and had a median AUC of 96.1%, accuracy of 90.0%, sensitivity of 94.4%, and specificity of 87.7%. Generally, neural network and ensemble algorithms, to some degree, out-performed other ML algorithms. Triaging and managing different prehospital medical conditions and augmenting ambulance performance can be improved by ML algorithms. Future reports should focus on a specific clinical condition or operational task to improve the precision of the performance metrics of ML models.
- Research Article
4
- 10.1016/j.mtsust.2022.100201
- Jul 21, 2022
- Materials Today Sustainability
State of health prediction of supercapacitors using multi-trend learning of NARX neural network
- Research Article
- 10.1249/mss.0000000000003727
- Apr 14, 2025
- Medicine and Science in Sports and Exercise
ABSTRACTPurposeOptimal performance in military tasks is crucial for operational success. These tasks are often simulated in training, assessing personnel performance within a military environment. However, these assessments are time-consuming and a potential injury risk. Physical characteristics such as muscular strength, power, aerobic endurance, and circumferences can be used to predict these dynamic and demanding tasks. Utilizing machine learning models to predict assessment outcomes may lead to optimized management of personnel, time, and interventions in the military.MethodsThis study recruited 35 participants to complete two physical sessions assessing multiple physical characteristics and lift-to-place and jerry-can-carry assessments. Machine learning models were developed to predict assessment outcomes based on a down-selection of physical characteristics metrics. Root mean square error (RMSE), normalized root mean square error (NRMSE), and coefficient of variation of the root mean square error (CVRMSE) were used to evaluate the models’ predictive capabilities.ResultsThe support vector regression (SVR) and ridge models could predict the lift-to-place outcome to an RMSE of ±1.77 kg (NRMSE = 4.44%, CVRMSE = 0.18) and ±2.33 kg (NRMSE = 5.84%; CVRMSE = 0.24) with four and three physical tests, respectively. The multilayer perceptron and SVR models predicted the jerry-can-carry outcome to ±3.36 laps (NRMSE = 23.06%, CVRMSE = 0.39) and ±3.67 laps (NRMSE = 25.20%, CVRMSE = 0.42) with 12 and 8 physical tests, respectively.ConclusionsThe lift-to-place outcome can be accurately predicted, showing potential military implementation. The jerry-can-carry outcome shows promise; however, further model optimization and training metrics are required to reduce error. Machine learning models demonstrate their applicability to optimize occupational selection pathways and training interventions for desirable performance in military settings.
- Book Chapter
12
- 10.1515/9783110702514-005
- Jul 5, 2021
Nowadays, facial expression analysis (FEA) is becoming an important application on various fields such as medicine, education, entertainment and crime analysis because it helps to analyze where no verbal communication is possible. FEA is being done after face recognition and depends on the feature extraction of how efficiently it is generated. Therefore, classification plays a vital role to acquire the necessary output to analyze the correct expression. In addition, machine learning (ML) and deep learning algorithms are useful to classify the data as system requires either structured-like text or unstructured-like images and videos perhaps to analyze the expression, and image input is preferred by the system as well because the face image consists of a kind of information like texture of organized features, age, gender and shape which cannot be described properly by the textual annotation to a corresponding image. The system can be done in different ways: either it can apply the deep learning algorithms on raw data, or can apply ML algorithms on the preprocessed images based on the user requirement. This chapter discusses the challenges and potential ML algorithms and efficient deep learning algorithms to recognize the automatic expression of humans to prop up with the significant areas such as human computer interaction, psychology in medical field, especially to analyze the behavior of suspected people in crowded areas probably in airports and so on. In recent years, ML algorithms had become very popular in the field of data retrieval to improve its efficiency and accuracy. A new state-ofthe- art image retrieval called ML algorithms plays an imperative role to decrease the gap semantically between the user expectation and images available in the database. This chapter presents a comprehensive study of ML algorithms such as supervised, unsupervised and a sequence of both. Furthermore, the demonstration of various ML algorithms is used for image classification, and of clustering which also represents the summary and comparison of ML algorithms for various datasets like COREL and face image database. Finally, the chapter concludes with the challenges and few recommendations of ML algorithms in image retrieval.
- Research Article
85
- 10.1089/fpsam.2019.29000.gua
- Feb 1, 2020
- Facial Plastic Surgery & Aesthetic Medicine
Importance: Quantitative assessment of facial function is challenging, and subjective grading scales such as House-Brackmann, Sunnybrook, and eFACE have well-recognized limitations. Machine learning (ML) approaches to facial landmark localization carry great clinical potential as they enable high-throughput automated quantification of relevant facial metrics from photographs and videos. However, the translation from research settings to clinical application still requires important improvements. Objective: To develop a novel ML algorithm for fast and accurate localization of facial landmarks in photographs of facial palsy patients and utilize this technology as part of an automated computer-aided diagnosis system. Design, Setting, and Participants: Portrait photographs of 8 expressions obtained from 200 facial palsy patients and 10 healthy participants were manually annotated by localizing 68 facial landmarks in each photograph and by 3 trained clinicians using a custom graphical user interface. A novel ML model for automated facial landmark localization was trained using this disease-specific database. Algorithm accuracy was compared with manual markings and the output of a model trained using a larger database consisting only of healthy subjects. Main Outcomes and Measurements: Root mean square error normalized by the interocular distance (NRMSE) of facial landmark localization between prediction of ML algorithm and manually localized landmarks. Results: Publicly available algorithms for facial landmark localization provide poor localization accuracy when applied to photographs of patients compared with photographs of healthy controls (NRMSE, 8.56 ± 2.16 vs. 7.09 ± 2.34, p ≪ 0.01). We found significant improvement in facial landmark localization accuracy for the facial palsy patient population when using a model trained with a relatively small number photographs (1440) of patients compared with a model trained using several thousand more images of healthy faces (NRMSE, 6.03 ± 2.43 vs. 8.56 ± 2.16, p ≪ 0.01). Conclusions and Relevance: Retraining a computer vision facial landmark detection model with fewer than 1600 annotated images of patients significantly improved landmark detection performance in frontal view photographs of this population. The new annotated database and facial landmark localization model represent the first steps toward an automatic system for computer-aided assessment in facial palsy. Level of Evidence: 4.
- Conference Article
6
- 10.56952/arma-2023-0287
- Jun 25, 2023
The traditional Arp's decline model failed to predict production from many oil and gas reservoirs due to some inherent assumptions like boundary-dominated flow contrary to long transient flow. Fundamentally, this is a time series curve fitting and forecasting problem. Advanced machine learning (ML) algorithms can be used to capture the unusual trend in hydrocarbon production decline. The objective of this study is to develop various ML algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) in forecasting future production performance and estimating ultimate recovery (EUR). Decline curve analysis (DCA) is a straightforward and rapid way to estimate future production simply by suitable curve fitting. However, the traditional Arp's method overestimates the production from many reservoirs, resulting in new empirical methods such as Power Law Exponential Analysis (PLE by Ilk, 2008), Logistic Growth Analysis (LGA by Clark 2011), and Duong Method (DM by Duong 2011). The outcomes of these recent models also depend on the quality of the data and the reservoir flow regimes. The machine learning algorithm is applied to overcome the drawbacks and limitations of the empirical decline curve models. Machine learning algorithms such as RNN, LSTM, and GRU are compared. The first 80% of time-series data is used for training the models and the last 20% is used for testing. The trained models are employed to forecast future rates and to calculate EUR. The value of NSE close to unity suggests good model performance. A normalized Nash-Sutcliffe model efficiency coefficient (NNSE) and Normalized Root Mean Squared Error (NRMSE) are selected for assessing the efficacy of different models. The LSTM models have several unique advantages over typical supervised machine learning algorithms. The models are flexible in handling multiple inputs in time series. The ML models developed in this work can be coupled with an economic model considering the future oil price and operational costs. Machine learning is a research area quickly growing across several industries providing valuable insights. Machine algorithm for time series forecasting in the oil and gas industry has not been comprehensively explored. Results from this work will provide the literature with another application perspective with strong opportunities in production data analysis.
- Research Article
1
- 10.1080/00103624.2021.1984515
- Oct 6, 2021
- Communications in Soil Science and Plant Analysis
The objective of the present study was to estimate the exchangeable sodium percentage (ESP) of the soil from the Bafra plain utilizing easily determined soil characteristics (EC and pH) with the use of artificial intelligence-based models. A total of 448 soil samples were taken from different points of the study area. Artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS) and support vector regression (SVR) models were developed and compared. The present database was randomly divided into training and test data sets (70:30). Coefficient of determination (R2), normalized root mean square error (NRMSE), normalized mean absolute error (NMAE), Nash and Sutcliffe model efficiency (NS) and Akaike’s Information Criterion (AIC) were used as statistical performance indicators to assess the accuracy of the models. Present findings revealed that both ANN (R2 = 0.91, NMAE = 0.21, NRMSE = 0.05, NS = 0.91 and AIC = 191.86) and ANFIS (R2 = 0.91, NMAE = 0.21, NRMSE = 0.05, NS = 0.91 and AIC = 195.51) models had greater general estimation performance than SVR (R2 = 0.89, NMAE = 0.49, NRMSE = 0.08, NS = 0.74 and AIC = 334.57) model. Comparative assessments revealed that ANN and ANFIS approaches could successfully be used in estimation of ESP from EC and pH data. It was concluded based on present findings that artificial intelligence-based techniques could reliably be used in estimation of soil ESP as a promising alternative of traditional approaches.
- Research Article
12
- 10.3390/en15186509
- Sep 6, 2022
- Energies
This paper demonstrates the applicability of machine learning algorithms in sand production problems with natural gas hydrate (NGH)-bearing sands, which have been regarded as a grave concern for commercialization. The sanding problem hinders the commercial exploration of NGH reservoirs. The common sand production prediction methods need assumptions for complicated mathematical derivations. The main contribution of this paper was to introduce machine learning into the prediction sand production by using data from laboratory experiments. Four main machine learning algorithms were selected, namely, K-Nearest Neighbor, Support Vector Regression, Boosting Tree, and Multi-Layer Perceptron. Training datasets for machine learning were collected from a sand production experiment. The experiment considered both the geological parameters and the sand control effect. The machine learning algorithms were mainly evaluated according to their mean absolute error and coefficient of determination. The evaluation results showed that the most accurate results under the given conditions were from the Boosting Tree algorithm, while the K-Nearest Neighbor had the worst prediction performance. Considering an ensemble prediction model, the Support Vector Regression and Multi-Layer Perceptron could also be applied for the prediction of sand production. The tuning process revealed that the Gaussian kernel was the proper kernel function for improving the prediction performance of SVR. In addition, the best parameters for both the Boosting Tree and Multi-Layer Perceptron were recommended for the accurate prediction of sand production. This paper also involved one case study to compare the prediction results of the machine learning models and classic numerical simulation, which showed the capability of machine learning of accurately predicting sand production, especially under stable pressure conditions.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.