Machine learning approaches for satellite-derived bathymetry in tropical coastal waters: A comparative study from Nha Trang marine protected area, Vietnam
Bathymetry mapping plays a critical role in coastal zone management, marine conservation, and navigation safety. With the increasing availability of high-resolution satellite imagery, such as PlanetScope (3−5 m), remote sensing-based bathymetry retrieval offers a cost-effective and scalable alternative to traditional in-situ surveys. This study explores the capability of PlanetScope imagery to retrieve a wide range of bathymetry (-0.5 − ~ -40 m) in the southern area of the Nha Trang Marine Protected Area (MPA), Vietnam - an ecologically significant and dynamic coastal region. We conduct a comprehensive comparison between traditional approaches, including the Stumpf ratio model and Multiple Linear Regression (MLR), and a suite of advanced machine learning (ML) algorithms, including Random Forest (RF), Support Vector Machine (SVM), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB), CatBoost (CB), and Gradient Boosting (GB). Among these, RF achieved the highest performance with an R2 of 0.85, RMSE of 2.66 m, and MAE of 1.85 m, significantly outperforming the Stumpf model (R2 = 0.29) and MLR (R2 = 0.57). This study represents one of the most extensive model comparisons to date for satellite-derived bathymetry using PlanetScope data, offering a benchmark for future applications in tropical coastal environments. Results underscore the potential of machine learning to advance spatially detailed and accurate bathymetric mapping from space.
- Research Article
8
- 10.1109/access.2024.3362676
- Jan 1, 2024
- IEEE Access
Smartwatches with cutting-edge sensors are becoming commonplace in our daily lives. Despite their widespread use, it can be challenging to interpret accelerometer and gyroscope data efficiently for Human Activity Recognition (HAR). An effective remedy is the incorporation of active learning strategies. This study explores this junction, intending to maximize the use of smartwatch technology across a range of applications. The previous research on the dataset used in our article did not provide results with a higher accuracy, which could make it difficult to make predictions. This paper proposes a novel approach to predict human activity from the Heterogeneity human activity recognition (HHAR) dataset that joins active learning with machine learning models: Random Forest (RF),Extreme Gradient Boosting (XGBoost), K-nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting (GB) and Light Gradient Boosting Machine (LGBM) classifier to predict heterogeneous activities accurately. We evaluated our approach to these models on the HHAR dataset that was generated using an accelerometer and gyroscope that were present in smartwatches. The dataset was evaluated on 3 iterations; the evaluation measures demonstrated that we can predict human activity with the highest accuracy and F1-Score of 99.99%. The results indicate that this approach is the most accurate and effective compared to the conventional machine learning approaches.
- Research Article
4
- 10.1177/20552076241272739
- Jan 1, 2024
- Digital health
Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques. Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction. The XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia. The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.
- Research Article
- 10.51244/ijrsi.2025.12030044
- Jan 1, 2025
- International Journal of Research and Scientific Innovation
The widespread use of technology has led to an increase in technostress which is a phenomenon where individuals experience stress and anxiety due to their interactions with technology. As social media platforms become increasingly integral to daily life, detecting technostress from online interactions has become a pressing concern and an avenue to enrich the research in the area of detecting technostress. This study evaluates the performance of selected base models on X (Twitter data). Also, the study investigated the effectiveness of a feature extraction technique for the improvement of the model performance through data preprocessing. The study made use of the dataset of X posts (Sentiment140) obtained from the Standford University. The extracted features were used to train and evaluate four base models: Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), and Light Gradient Boosting Machine (LGBM). The performance of each model was evaluated based on accuracy, precision, recall, F1-score and Kappa statistics. The RF model outperformed other base models with accuracy, precision, recall, f1-score, and Kappa score values of 88.03%, 85.98%, 85.68%, 85.79% and 79.81% respectively. The results highlight the importance of preprocessing and feature extraction techniques in improving model performance; contributes to the development of more effective technostress detection systems and provide insights into the application of machine learning algorithms for analysing online interactions.
- Research Article
- 10.35629/5252-07032434
- Mar 1, 2025
- International Journal of Advances in Engineering and Management
The widespread use of technology has led to an increase in technostress which is a phenomenon where individuals experience stress and anxiety due to their interactions with technology. As social media platforms become increasingly integral to daily life, detecting technostress from online interactions has become a pressing concern and an avenue to enrich the research in the area of detecting technostress. This study evaluates the performance of a meta learner strategy using Support Vector Classifier following the implementation of selected base models on X (Twitter data). Also, the study investigated the effectiveness of a feature extraction technique for the improvement of the model performance through data preprocessing including the use of lemmatization and polarity scoring technique. The study made use of the dataset of X posts (Sentiment140) obtained from the Standford University. The extracted features were used to train and evaluate four base models: Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), and Light Gradient Boosting Machine (LGBM). The results of the base models were then used as meta features for the meta learner strategy. The performance of the stacked ensemble shows that the meta learner strategy improved substantially the detection of technostress with improved performance across the evaluation metrics such as accuracy, precision, recall, f1-score, and Kappa score values of 97.03%, 96.88%, 93.92%, 91.63%, and 87.60% respectively. The results highlight the importance stack ensembling in improving model performance; contributes to the development of more effective technostress detection systems and provide insights into the application of machine learning algorithms for analysing online interactions.
- Research Article
17
- 10.1016/j.jgsce.2023.204916
- Feb 3, 2023
- Gas Science and Engineering
Productivity prediction in the Wolfcamp A and B using weighted voting ensemble machine learning method
- Research Article
13
- 10.1038/s41598-023-43211-w
- Sep 25, 2023
- Scientific Reports
Although the goal of rectal cancer treatment is to restore gastrointestinal continuity, some patients with rectal cancer develop a permanent stoma (PS) after sphincter-saving operations. Although many studies have identified the risk factors and causes of PS, few have precisely predicted the probability of PS formation before surgery. To validate whether an artificial intelligence model can accurately predict PS formation in patients with rectal cancer after sphincter-saving operations. Patients with rectal cancer who underwent a sphincter-saving operation at Taipei Medical University Hospital between January 1, 2012, and December 31, 2021, were retrospectively included in this study. A machine learning technique was used to predict whether a PS would form after a sphincter-saving operation. We included 19 routinely available preoperative variables in the artificial intelligence analysis. To evaluate the efficiency of the model, 6 performance metrics were utilized: accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiving operating characteristic curve. In our classification pipeline, the data were randomly divided into a training set (80% of the data) and a validation set (20% of the data). The artificial intelligence models were trained using the training dataset, and their performance was evaluated using the validation dataset. Synthetic minority oversampling was used to solve the data imbalance. A total of 428 patients were included, and the PS rate was 13.6% (58/428) in the training set. The logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest, decision tree and light gradient boosting machine (LightGBM) algorithms were employed. The accuracies of the logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest (RF), decision tree (DT) and light gradient boosting machine (LightGBM) models were 70%, 76%, 89%, 93%, 95%, 79% and 93%, respectively. The area under the receiving operating characteristic curve values were 0.79 for the LR model, 0.84 for the GNB, 0.95 for the XGB, 0.95 for the GB, 0.99 for the RF model, 0.79 for the DT model and 0.98 for the LightGBM model. The key predictors that were identified were the distance of the lesion from the anal verge, clinical N stage, age, sex, American Society of Anesthesiologists score, and preoperative albumin and carcinoembryonic antigen levels. Integration of artificial intelligence with available preoperative data can potentially predict stoma outcomes after sphincter-saving operations. Our model exhibited excellent predictive ability and can improve the process of obtaining informed consent.
- Research Article
3
- 10.1371/journal.pone.0314988
- Dec 9, 2024
- PLOS ONE
The increasing complexity of diagnostic imaging often leads to misinterpretations and diagnostic errors, particularly in critical conditions such as pneumothorax. This study addresses the pressing need for improved diagnostic accuracy in CT scans by developing an intelligent model that leverages radiomics features and machine learning techniques. By enhancing the detection of pneumothorax, this research aims to mitigate diagnostic errors and accelerate the process of image interpretation, ultimately improving patient outcomes. Data used in this study was extracted from the medical records of 175 patients with suspected pneumothorax. The collected images were preprocessed in Matlab software. Radiomics features were extracted from each image and finally, the machine learning models were implemented on these features. The used machine learning algorithms are Gradient Tree Boosting (GBM), eXtreme Gradient Boosting (XGBoost), and Light GBM. To evaluate the performance of models, various evaluation criteria such as precision, accuracy, specificity, sensitivity, F1 score, Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), and misclassification were calculated. According to the calculated evaluation criteria, in terms of accuracy, the Gradient Boosting Machine (GBM) model achieved the highest performance with an accuracy of 98.97%, followed closely by the XGBoost model at 98.29%. For precision, the GBM model outperformed the other models, recording a precision value of 99.55%. Regarding sensitivity, all three models—GBM, XGBoost, and LightGBM (LGBM)—demonstrated strong performance, with sensitivity values of 99%, 99%, and 100%, respectively, indicating minimal variation among them. The artificial intelligence models used in this study have significant potential to enhance patient care by supporting radiologists and other clinicians in the diagnosis of pneumothorax. These models can facilitate the prioritization of positive cases, expedite evaluations, and ultimately improve patient outcomes.
- Research Article
22
- 10.1016/j.eswa.2024.125836
- Nov 21, 2024
- Expert Systems With Applications
Machine learning-driven prediction of tensile strength in 3D-printed PLA parts
- Research Article
5
- 10.2147/jir.s471626
- Sep 1, 2024
- Journal of inflammation research
Machine learning (ML) is increasingly used in medical predictive modeling, but there are no studies applying ML to predict prognosis in Guillain-Barré syndrome (GBS). The medical records of 223 patients with GBS were analyzed to construct predictive models that affect patient prognosis. Least Absolute Shrinkage and Selection Operator (LASSO) was used to filter the variables. Decision Trees (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), k-nearest Neighbour (KNN), Naive Bayes (NB), Neural Network (NN). Light Gradient Boosting Machine (LGBM) and Logistic Regression (LR) were used to construct predictive models. Clinical data from 55 GBS patients were used to validate the model. SHapley additive explanation (SHAP) analysis was used to explain the model. Single sample gene set enrichment analysis (ssGSEA) was used for immune cell infiltration analysis. The AUCs (area under the curves) of the 8 ML algorithms including DT, RF, XGBoost, KNN, NB, NN, LGBM and LR were as follows: 0.75, 0.896 0.874, 0.666, 0.742, 0.765, 0.869 and 0.744. The accuracy of XGBoost (0.852) was the highest, followed by LGBM (0.803) and RF (0.758), with F1 index of 0.832, 0.794, and 0.667, respectively. The results of the validation set data analysis showed AUCs of 0.839, 0.919, and 0.733 for RF, XGBoost, and LGBM, respectively. SHAP analysis showed that the SHAP values of blood neutrophil/lymphocyte ratio (NLR), age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus nerve were 0.821, 0.645, 0.517, 0.401 and 0.109, respectively. The combination of NLR, age, mechanical ventilation, hyporeflexia and abnormal glossopharyngeal vagus used to predict short-term prognosis in patients with GBS has a good predictive value.
- Research Article
3
- 10.1002/est2.70133
- Feb 1, 2025
- Energy Storage
ABSTRACTLithium‐ion cells have become an important part of our daily lives. They are used to power mobile phones, laptops and more recently electric vehicles (both two‐ and four‐wheelers). The chemical behavior of the cells is rather complex and non‐linear. For reliable and sustainable use of the cells for practical applications, it is imperative to predict the precise pace at which their capacity will degrade. More importantly, the lifetime of the cells must be predicted at an early stage, which would accelerate development and design optimization of the cells. However, most of the existing methods cannot predict the lifetime at an early stage, since there is a weak correlation between the cell capacity and lifetime. In this study for accurate forecasting of the battery lifetime, the patterns of the parameters such as cell current, voltage, temperature, charging time, internal resistance, and capacity were examined during charging and discharging cycle of the cell. Twelve manually crafted features were prepared from these parameters. The dataset for the features was created using the raw data of the first 100 cycles of 124 cells. Six ensemble and non‐ensemble machine learning algorithms, namely, multiple linear regression (MLR), decision tree, support vector machine (SVM), gradient boosting machine (GBM), light gradient boosting machine (LGBM), and extreme gradient boosting (XGBoost), were trained with the features for predicting the life‐cycle of the cells. The R2 and root mean squared error (RMSE) values of MLR, decision tree, SVM, GBM, LGBM, and XGBoost were found to be 0.72 and 201, 0.83 and 155, 0.85 and 146, 0.92 and 100, 0.9 and 112, and 0.94 and 95, respectively. The prediction accuracy of lithium‐ion cell life‐time was found to be the best with the XGBoost algorithm. This shows that only first 100 cycles are required foraccurately predicting the number of cycles the lithium‐ion cell can work for. Lastly, the results of the study were compared with the available studies in the literature. Three studies were chosen, and the RMSE of the method proposed in this study was found to be higher than the three studies by 43, 17, and 20. Therefore, the proposed method is a suitable option for predicting the lifetime of lithium‐ion cells during the early stages of its development.
- Research Article
20
- 10.3390/life12040604
- Apr 18, 2022
- Life
This study was a multicenter retrospective cohort study of term nulliparous women who underwent labor, and was conducted to develop an automated machine learning model for prediction of emergent cesarean section (CS) before onset of labor. Nine machine learning methods of logistic regression, random forest, Support Vector Machine (SVM), gradient boosting, extreme gradient boosting (XGBoost), light gradient boosting machine (LGBM), k-nearest neighbors (KNN), Voting, and Stacking were applied and compared for prediction of emergent CS during active labor. External validation was performed using a nationwide multicenter dataset for Korean fetal growth. A total of 6549 term nulliparous women was included in the analysis, and the emergent CS rate was 16.1%. The C-statistics values for KNN, Voting, XGBoost, Stacking, gradient boosting, random forest, LGBM, logistic regression, and SVM were 0.6, 0.69, 0.64, 0.59, 0.66, 0.68, 0.68, 0.7, and 0.69, respectively. The logistic regression model showed the best predictive performance with an accuracy of 0.78. The machine learning model identified nine significant variables of maternal age, height, weight at pre-pregnancy, pregnancy-associated hypertension, gestational age, and fetal sonographic findings. The C-statistic value for the logistic regression machine learning model in the external validation set (1391 term nulliparous women) was 0.69, with an overall accuracy of 0.68, a specificity of 0.83, and a sensitivity of 0.41. Machine learning algorithms with clinical and sonographic parameters at near term could be useful tools to predict individual risk of emergent CS during active labor in nulliparous women.
- Research Article
- 10.1007/s44163-026-01048-y
- Mar 9, 2026
- Discover Artificial Intelligence
This study investigates advanced machine learning techniques for predicting heart disease, emphasizing the critical role of early diagnosis in cardiovascular diseases (CVDs), which remain among the leading causes of mortality worldwide. Early and accurate detection can substantially reduce mortality rates and improve public health outcomes. In this context, advanced machine learning algorithms, such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Random Forest (RF), have emerged as powerful tools for analyzing complex patterns in medical data. The performance and accuracy of 13 different machine learning algorithms were evaluated in a binary classification task aimed at distinguishing between healthy individuals and those at high risk for heart disease. The combined dataset consisted of 1328 samples, with 50.08% classified as high-risk patients and 49.92% as healthy individuals, providing a balanced distribution for effective machine learning analysis. The dataset included 14 features, such as age, gender, blood pressure, cholesterol, and other health-related factors. Model performance was assessed using 13 different evaluation metrics, and results were reported as mean (m), standard deviation (SD), and root mean square error (RMSE). Pairwise comparisons of algorithms based on accuracy were performed using Significance Testing Between Models to evaluate statistically significant differences. Additionally, an exploratory data analysis was conducted to assess the influence of individual features on model outputs. The findings indicate that the RF algorithm achieved high accuracy (93.82 ± 1.64%) as well as high sensitivity, while models such as Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), Extra Trees Classifier (ETC), and Decision Tree (DT) consistently ranked second and third across evaluation metrics. These results demonstrate that leveraging machine learning techniques enhances diagnostic accuracy, facilitates rapid identification of high-risk individuals, and reduces healthcare costs. Ultimately, this study provides a foundation for developing innovative prediction methods and management strategies for cardiovascular diseases.
- Research Article
- 10.55606/jeei.v5i3.5742
- Oct 30, 2025
- Journal of Engineering, Electrical and Informatics
Thyroid illness is one of the most prevalent medical problems that has a direct impact on a person's physical and emotional well-being. The 2017–2020 NHANES data, which is extensive and contains a wide variety of 6,992 people and XX characteristics, is the source of the ML used in this study. Improving the early identification and classification of vulnerable people is the goal of this study. The machine learning techniques used in this study include K-Nearest Neighbor (KNN), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR), Extreme Gradient Boosting (EGB), LightGBM (LGBM), Multi-Layer Perceptron (MLP), and Gradient Boosting. Evaluation of these algorithms revealed that RF, EGB, and LGBM exhibited exceptional accuracy, reaching an impressive 0.90. Among them, RF demonstrated the highest precision at 0.98, showcasing its ability to correctly identify individuals at risk with a high degree of confidence. Moreover, the study identified KNN as the algorithm with the highest recall value, reaching 0.73, highlighting its effectiveness in capturing a substantial proportion of true positive cases. EGB emerged with the highest F1-Score, shows a proportionate balance between recall and accuracy. Additionally, EGB displayed the highest Area Under the Curve (AUC) at 0.82, underscoring its robust predictive capabilities. This research underscores the pivotal role of ML algorithms in predicting and classifying thyroid disease risk, offering valuable insights for early intervention and personalized healthcare strategies. The high accuracy, precision, and recall values observed with RF, EGB, and LGBM suggest their potential as powerful tools for improving diagnostic capabilities in the realm of thyroid disease, contributing to more effective and timely patient care. As advancements in machine learning continue, the integration of these techniques into healthcare frameworks holds promise for enhancing our understanding and management of thyroid disorders.
- Research Article
- 10.1038/s41598-025-32350-x
- Dec 16, 2025
- Scientific Reports
This study presents a hybrid approach combining experimental evaluation and machine learning modeling for the compressive strength (CS) estimation of natural fiber-reinforced concrete (NFRC), which utilizes jute, coir, and bamboo fibers. A dataset of 444 concrete mix designs was compiled from the literature using seven input variables. Six advanced machine learning (ML) algorithms: Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB), Random Forest, Gradient Boosting, Decision Tree, and K-Nearest Neighbors were trained and tested with proper hyperparameter optimization. Among these, the LGBM model demonstrated superior accuracy in prediction with an R2 of 0.8637 and the lowest root mean square error of 4.19 MPa on the testing sets. To ensure interpretability and mix design optimization, SHapley Additive Explanations (SHAP) and Partial Dependence Plot (PDP) analyses were incorporated. The dominant predictors of CS were found to be cement content, water, coarse aggregate, and supplementary cementitious materials, while the curing period and fiber content also showed a small but meaningful effect. To further validate the modeling outcomes, experimental investigations were conducted by developing 10 mix combinations with varying percentages of coir fiber, which were then subjected to compressive strength and scanning electron microscopy tests. The outcomes also validated that the incorporation of natural fiber up to 0.75% gradually increased the CS to a maximum of 41.3% after 28 days of curing, while further addition reduced performance. The dual-approach-based (ML and experimental) outcomes could assist in the sustainable advancement of the infrastructure industry as a potential solution for cost-effective, large-scale production of NFRC.
- Research Article
4
- 10.37385/jaets.v4i2.1925
- Jun 5, 2023
- Journal of Applied Engineering and Technological Science (JAETS)
Recent years have seen the rapid deployment of Artificial Intelligence (AI) which allows systems to take intelligent decisions. AI breakthroughs could radically change modern libraries' operations. However, introducing AI in modern libraries is a challenging task. This research explores the potential for smart libraries to improve the caliber of user services through the use of machine learning (ML) techniques. The proposed work investigates machine learning methods such as Random Forest (RF) and boosting algorithms, including Light Gradient Boosting Machine (LGBM), Histogram-based gradient boosting (HGB), Extreme gradient boosting (XGB), CatBoost (CB), AdaBoost (AB), and Gradient Boosting (GB) for the task of identifying and classifying Favorite books and compares their performances. Comprehensive experiments performed on the publicly available dataset (Art Garfunkel's Library) show that the proposed model can effectively handle the task of identifying and classifying Favorite books. Experimental results show that LGBM has achieved outstanding performance with an accuracy rate of 94.9367% than Random Forest and other boosting ML algorithms. This empirical research work takes advantage of AI adoption in libraries using machine learning techniques. To the best of our knowledge, we are the first to develop an intelligent application for the modern library to automatically identify and classify Favorite books