Sentiment Analysis of Healthcare Services at RSUD Soe Using Machine Learning and LDA
This study analyzes public perceptions of RSUD Soe's healthcare services using sentiment analysis with four machine learning algorithms, with Naïve Bayes achieving 82.14% accuracy. LDA identified twelve key themes, mostly positive, guiding targeted improvements.
Healthcare services constitute a crucial aspect in improving public well-being. Every individual has the right to receive healthcare services that are of high quality, safe, efficient, and affordable. This study aims to identify and analyze public perceptions and sentiments toward healthcare services at RSUD Soe, as well as to evaluate the performance of several machine learning methods in classifying such sentiments. The data were collected from 278 respondents through a Likert-scale questionnaire that represents perceptions and levels of satisfaction regarding various service aspects. Sentiment analysis was conducted using four machine learning algorithms, namely Naïve Bayes, C4.5, Random Forest, and Support Vector Machine. The results indicate that Naïve Bayes achieved the highest accuracy of 82.14 percent, followed by SVM at 80 percent, Random Forest at 79 percent, and C4.5 at 73.21 percent. This study also applied the Latent Dirichlet Allocation method to identify the main themes within public feedback. LDA generated twelve topics reflecting key issues such as waiting time, availability of medical personnel, facility cleanliness, and the attitudes of healthcare staff. The majority of comments exhibited positive sentiment, particularly concerning staff friendliness and service quality. These findings were used to formulate improvement recommendations, including enhancing service quality, increasing the number of medical personnel, and optimizing facilities. This research demonstrates that a data-driven quantitative approach is effective in evaluating healthcare service quality and supporting more targeted decision-making. The results are expected to assist RSUD Soe in continuously and effectively improving service quality.
- Research Article
- 10.1371/journal.pone.0343624
- May 11, 2026
- PLOS One
Recent advances in statistical and machine learning (ML) methods have improved the prediction of soil attributes at fine spatial scales, yet the comparative performance and reliability of these techniques remain unclear. This study compared Ordinary Kriging (OK), Inverse Distance Weighting (IDW), and ML algorithms in predicting and spatializing soil attributes, while also evaluating prediction uncertainty and computational processing time. Conducted in Minas Gerais State (Brazil), the analysis used Euclidean distance based predictors derived from X-Y coordinates and regular grids with 5, 7, and 10 divisions. Soil attribute maps (CEC, phosphorus, sand, and clay) were generated using OK, IDW, Random Forest (RF), Cubist, Support Vector Machine (SVM), and Earth. Model performance was assessed using R2, RMSE, MAE, and the coefficient of variation. IDW and OK showed the lowest predictive accuracy (R2 = 0.52–0.58), whereas ML methods, especially RF and SVM achieved superior performance (R2 = 0.62–0.70). Among ML algorithms, Earth performed worst, while RF produced the highest accuracy for all attributes except sand, for which SVM performed best. Processing time was shortest for IDW, followed by OK; among ML models, Earth was fastest, followed by RF, SVM, and Cubist. Larger regular grids improved ML prediction and spatialization but increased computational cost. ML methods thus outperform traditional geostatistical interpolators, benefiting from the use of numerous covariates and flexible algorithmic structures, although requiring greater computational time. These findings demonstrate the robustness and practical potential of ML approaches for soil attribute mapping.
- Research Article
- 10.1002/fsat.3304_6.x
- Dec 1, 2019
- Food Science and Technology
Sensors support machine learning
- Research Article
3
- 10.15837/ijccc.2021.6.4549
- Dec 3, 2021
- INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
In the era of artificial intelligence, machine learning methods are successfully used in various fields. Machine learning has attracted extensive attention from investors in the financial market, especially in stock price prediction. However, one argument for the machine learning methods used in stock price prediction is that they are black-box models which are difficult to interpret. In this paper, we focus on the future stock price prediction with the historical stock price by machine learning and deep learning methods, such as support vector machine (SVM), random forest (RF), Bayesian classifier (BC), decision tree (DT), multilayer perceptron (MLP), convolutional neural network (CNN), bi-directional long-short term memory (BiLSTM), the embedded CNN, and the embedded BiLSTM. Firstly, we manually design several financial time series where the future price correlates with the historical stock prices in pre-designed modes, namely the curve-shape-feature (CSF) and the non-curve-shape-feature (NCSF) modes. In the CSF mode, the future prices can be extracted from the curve shapes of the historical stock prices. Conversely, in the NCSF mode, they can’t. Secondly, we apply various algorithms to those pre-designed and real financial time series. We find that the existing machine learning and deep learning algorithms fail in stock price prediction because in the real financial time series, less information of future prices is contained in the CSF mode, and perhaps more information is contained in the NCSF. Various machine learning and deep learning algorithms are good at handling the CSF in historical data, which are successfully applied in image recognition and natural language processing. However, they are inappropriate for stock price prediction on account of the NCSF. Therefore, accurate stock price prediction is the key to successful investment, and new machine learning algorithms handling the NCSF series are needed.
- Research Article
14
- 10.1186/s41043-024-00647-8
- Oct 12, 2024
- Journal of Health, Population and Nutrition
Background and aimsThe birth weight of a newborn is a crucial factor that affects their overall health and future well-being. Low birth weight (LBW) is a widespread global issue, which the World Health Organization defines as weighing less than 2,500 g. LBW can have severe negative consequences on an individual’s health, including neonatal mortality and various health concerns throughout their life. To address this problem, this study has been conducted using BDHS 2017–2018 data to uncover important aspects of LBW using a variety of machine learning (ML) approaches and to determine the best feature selection technique and best predictive ML model.MethodsTo pick out the key features, the Boruta algorithm and wrapper method were used. Logistic Regression (LR) used as traditional method and several machine learning classifiers were then used, including, DT (Decision Tree), SVM (Support Vector Machine), NB (Naïve Bayes), RF (Random Forest), XGBoost (eXtreme Gradient Boosting), and AdaBoost (Adaptive Boosting), to determine the best model for predicting LBW. The model’s performance was evaluated based on the specificity, sensitivity, accuracy, F1 score and AUC value.ResultsResult shows, Boruta algorithm identifies eleven significant features including respondent’s age, highest education level, educational attainment, wealth index, age at first birth, weight, height, BMI, age at first sexual intercourse, birth order number, and whether the child is a twin. Incorporating Boruta algorithm’s significant features, the performance of traditional LR and ML methods including DT, SVM, NB, RF, XGBoost, and AB were evaluated where LR, had a specificity, sensitivity, accuracy and F1 score of 0.85, 0.5, 85.15% and 0.915. While the ML methods DT, SVM, NB, RF, XGBoost, and AB model’s respective accuracy values were 85.35%, 85.15%, 84.54%, 81.18%, and 84.41%. Based on the specificity, sensitivity, accuracy, F1 score and AUC, RF (specificity = 0.99, sensitivity = 0.58, accuracy = 85.86%, F1 score = 0.9243, AUC = 0.549) outperformed the other methods. Both the classical (LR) and machine learning (ML) models’ performance has improved dramatically when important characteristics are extracted using the wrapper method. The LR method identified five significant features with a specificity, sensitivity, accuracy and F1 score of 0.87, 0.33, 87.12% and 0.9309. The region, whether the infant is a twin, and cesarean delivery were the three key features discovered by the DT and RF models, which were implemented using the wrapper technique. All three models had the identical F1 score of 0.9318. However, “child is twin” was recognized as a significant feature by the SVM, NB, and AB models, with an F1 score of 0.9315. Ultimately, with an F1 score of 0.9315, the XGBoost model recognized “child is twin” and “age at first sex” as relevant features. Random Forest again beat the other approaches in this instance.ConclusionsThe study reveals Wrapper method as the optimal feature selection technique. The ML method outperforms traditional methods, with Random Forest (RF) being the most effective predictive model for Low-Birth-Weight prediction. The study suggests that policymakers in Bangladesh can mitigate low birth weight newborns by considering identified risk factors.
- Research Article
56
- 10.3390/su15065341
- Mar 17, 2023
- Sustainability
Air pollution in Macau has become a serious problem following the Pearl River Delta’s (PRD) rapid industrialization that began in the 1990s. With this in mind, Macau needs an air quality forecast system that accurately predicts pollutant concentration during the occurrence of pollution episodes to warn the public ahead of time. Five different state-of-the-art machine learning (ML) algorithms were applied to create predictive models to forecast PM2.5, PM10, and CO concentrations for the next 24 and 48 h, which included artificial neural networks (ANN), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), and multiple linear regression (MLR), to determine the best ML algorithms for the respective pollutants and time scale. The diurnal measurements of air quality data in Macau from 2016 to 2021 were obtained for this work. The 2020 and 2021 datasets were used for model testing, while the four-year data before 2020 and 2021 were used to build and train the ML models. Results show that the ANN, RF, XGBoost, SVM, and MLR models were able to provide good performance in building up a 24-h forecast with a higher coefficient of determination (R2) and lower root mean square error (RMSE), mean absolute error (MAE), and biases (BIAS). Meanwhile, all the ML models in the 48-h forecasting performance were satisfactory enough to be accepted as a two-day continuous forecast even if the R2 value was lower than the 24-h forecast. The 48-h forecasting model could be further improved by proper feature selection based on the 24-h dataset, using the Shapley Additive Explanations (SHAP) value test and the adjusted R2 value of the 48-h forecasting model. In conclusion, the above five ML algorithms were able to successfully forecast the 24 and 48 h of pollutant concentration in Macau, with the RF and SVM models performing the best in the prediction of PM2.5 and PM10, and CO in both 24 and 48-h forecasts.
- Research Article
28
- 10.12989/scs.2020.37.2.193
- Jan 1, 2020
- Steel and Composite Structures
In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.
- Research Article
21
- 10.1371/journal.pone.0296625
- Feb 13, 2024
- PLOS ONE
Undernutrition among children under the age of five is a major public health concern, especially in developing countries. This study aimed to use machine learning (ML) algorithms to predict undernutrition and identify its associated factors. Secondary data analysis of the 2017 Multiple Indicator Cluster Survey (MICS) was performed using R and Python. The main outcomes of interest were undernutrition (stunting: height-for-age (HAZ) < -2 SD; wasting: weight-for-height (WHZ) < -2 SD; and underweight: weight-for-age (WAZ) < -2 SD). Seven ML algorithms were trained and tested: linear discriminant analysis (LDA), logistic model, support vector machine (SVM), random forest (RF), least absolute shrinkage and selection operator (LASSO), ridge regression, and extreme gradient boosting (XGBoost). The ML models were evaluated using the accuracy, confusion matrix, and area under the curve (AUC) receiver operating characteristics (ROC). In total, 8564 children were included in the final analysis. The average age of the children was 926 days, and the majority were females. The weighted prevalence rates of stunting, wasting, and underweight were 17%, 7%, and 12%, respectively. The accuracies of all the ML models for wasting were (LDA: 84%; Logistic: 95%; SVM: 92%; RF: 94%; LASSO: 96%; Ridge: 84%, XGBoost: 98%), stunting (LDA: 86%; Logistic: 86%; SVM: 98%; RF: 88%; LASSO: 86%; Ridge: 86%, XGBoost: 98%), and for underweight were (LDA: 90%; Logistic: 92%; SVM: 98%; RF: 89%; LASSO: 92%; Ridge: 88%, XGBoost: 98%). The AUC values of the wasting models were (LDA: 99%; Logistic: 100%; SVM: 72%; RF: 94%; LASSO: 99%; Ridge: 59%, XGBoost: 100%), for stunting were (LDA: 89%; Logistic: 90%; SVM: 100%; RF: 92%; LASSO: 90%; Ridge: 89%, XGBoost: 100%), and for underweight were (LDA: 95%; Logistic: 96%; SVM: 100%; RF: 94%; LASSO: 96%; Ridge: 82%, XGBoost: 82%). Age, weight, length/height, sex, region of residence and ethnicity were important predictors of wasting, stunting and underweight. The XGBoost model was the best model for predicting wasting, stunting, and underweight. The findings showed that different ML algorithms could be useful for predicting undernutrition and identifying important predictors for targeted interventions among children under five years in Ghana.
- Research Article
18
- 10.1007/s00704-023-04725-5
- Nov 8, 2023
- Theoretical and Applied Climatology
Geospatial atmospheric data is the input variable of a wide range of hydrological and ecological spatial models, many of which are oriented towards improving the socioeconomic and environmental sustainability. Here, we provide an evaluation of machine learning (ML) methods for the spatial interpolation of annual precipitation, minimum and maximum temperatures for a mountain range, in this case, the Pyrenees. To this end, this work compares the performance and accuracy of multiple linear regressions (MLR) and generalized additive models (GAM) against five ML methods (K-Nearest Neighbors, Supported Vector Machines, Neural Networks, Stochastic Gradient Boosting and Random Forest). The ML algorithms outperformed the MLR and GAM independently of the predictor variables used, the geographical sector analyzed or the elevation range. Overall, the differences between ML algorithms are negligible. Random Forest shows a slightly higher than average accuracy for the spatial interpolation of precipitation (R2 = 0.93; MAE = 70.44 mm), whereas Stochastic Gradient Boosting is the best ML method for the spatial interpolation of the mean maximum annual temperature (R2 = 0.96, MAE = 0.43 ºC). Stochastic Gradient Boosting, Neural Networks and Random Forest have similar performances for the spatial interpolation of the mean minimum annual temperature (R2 = 0.98, MAE = 0.19 ºC). Results presented here can be valuable for the past and future climate spatial analysis, environmental niche modelling, hydrological projections, and water management.
- Research Article
21
- 10.1016/j.atech.2023.100193
- Jan 31, 2023
- Smart Agricultural Technology
Mapping cropland extent using sentinel-2 datasets and machine learning algorithms for an agriculture watershed
- Research Article
12
- 10.1186/s12931-024-02911-1
- Jul 24, 2024
- Respiratory Research
BackgroundThe use of machine learning(ML) methods would improve the diagnosis of small airway dysfunction(SAD) in subjects with chronic respiratory symptoms and preserved pulmonary function(PPF). This paper evaluated the performance of several ML algorithms associated with the impulse oscillometry(IOS) analysis to aid in the diagnostic of respiratory changes in SAD. We also find out the best configuration for this task.MethodsIOS and spirometry were measured in 280 subjects, including a healthy control group (n = 78), a group with normal spirometry (n = 158) and a group with abnormal spirometry (n = 44). Various supervised machine learning (ML) algorithms and feature selection strategies were examined, such as Support Vector Machines (SVM), Random Forests (RF), Adaptive Boosting (ADABOOST), Navie Bayesian (BAYES), and K-Nearest Neighbors (KNN).ResultsThe first experiment of this study demonstrated that the best oscillometric parameter (BOP) was R5, with an AUC value of 0.642, when comparing a healthy control group(CG) with patients in the group without lung volume-defined SAD(PPFN). The AUC value of BOP in the control group was 0.769 compared with patients with spirometry defined SAD(PPFA) in the PPF population. In the second experiment, the ML technique was used. In CGvsPPFN, RF and ADABOOST had the best diagnostic results (AUC = 0.914, 0.915), with significantly higher accuracy compared to BOP (p < 0.01). In CGvsPPFA, RF and ADABOOST had the best diagnostic results (AUC = 0.951, 0.971) and significantly higher diagnostic accuracy (p < 0.01). In the third, fourth and fifth experiments, different feature selection techniques allowed us to find the best IOS parameters (R5, (R5-R20)/R5 and Fres). The results demonstrate that the performance of ADABOOST remained essentially unaltered following the application of the feature selector, whereas the diagnostic accuracy of the remaining four classifiers (RF, SVM, BAYES, and KNN) is marginally enhanced.ConclusionsIOS combined with ML algorithms provide a new method for diagnosing SAD in subjects with chronic respiratory symptoms and PPF. The present study’s findings provide evidence that this combination may help in the early diagnosis of respiratory changes in these patients.
- Research Article
20
- 10.15212/cvia.2023.0011
- Jan 1, 2023
- Cardiovascular Innovations and Applications
Objective: Cardiovascular disease (CVD) is one of the leading causes of death worldwide, and answers are urgently needed regarding many aspects, particularly risk identification and prognosis prediction. Real-world studies with large numbers of observations provide an important basis for CVD research but are constrained by high dimensionality, and missing or unstructured data. Machine learning (ML) methods, including a variety of supervised and unsupervised algorithms, are useful for data governance, and are effective for high dimensional data analysis and imputation in real-world studies. This article reviews the theory, strengths and limitations, and applications of several commonly used ML methods in the CVD field, to provide a reference for further application. Methods: This article introduces the origin, purpose, theory, advantages and limitations, and applications of multiple commonly used ML algorithms, including hierarchical and k-means clustering, principal component analysis, random forest, support vector machine, and neural networks. An example uses a random forest on the Systolic Blood Pressure Intervention Trial (SPRINT) data to demonstrate the process and main results of ML application in CVD. Conclusion: ML methods are effective tools for producing real-world evidence to support clinical decisions and meet clinical needs. This review explains the principles of multiple ML methods in plain language, to provide a reference for further application. Future research is warranted to develop accurate ensemble learning methods for wide application in the medical field.
- Research Article
27
- 10.1016/j.scitotenv.2021.148738
- Jun 29, 2021
- Science of the Total Environment
Parameter importance assessment improves efficacy of machine learning methods for predicting snow avalanche sites in Leh-Manali Highway, India
- Research Article
15
- 10.1080/23279095.2024.2382823
- Jul 31, 2024
- Applied Neuropsychology: Adult
The cognitive impairment known as dementia affects millions of individuals throughout the globe. The use of machine learning (ML) and deep learning (DL) algorithms has shown great promise as a means of early identification and treatment of dementia. Dementias such as Alzheimer’s Dementia, frontotemporal dementia, Lewy body dementia, and vascular dementia are all discussed in this article, along with a literature review on using ML algorithms in their diagnosis. Different ML algorithms, such as support vector machines, artificial neural networks, decision trees, and random forests, are compared and contrasted, along with their benefits and drawbacks. As discussed in this article, accurate ML models may be achieved by carefully considering feature selection and data preparation. We also discuss how ML algorithms can predict disease progression and patient responses to therapy. However, overreliance on ML and DL technologies should be avoided without further proof. It’s important to note that these technologies are meant to assist in diagnosis but should not be used as the sole criteria for a final diagnosis. The research implies that ML algorithms may help increase the precision with which dementia is diagnosed, especially in its early stages. The efficacy of ML and DL algorithms in clinical contexts must be verified, and ethical issues around the use of personal data must be addressed, but this requires more study.
- Research Article
65
- 10.1186/s12903-021-01996-0
- Dec 1, 2021
- BMC Oral Health
BackgroundRecently, the dental age estimation method developed by Cameriere has been widely recognized and accepted. Although machine learning (ML) methods can improve the accuracy of dental age estimation, no machine learning research exists on the use of the Cameriere dental age estimation method, making this research innovative and meaningful.AimThe purpose of this research is to use 7 lower left permanent teeth and three models [random forest (RF), support vector machine (SVM), and linear regression (LR)] based on the Cameriere method to predict children's dental age, and compare with the Cameriere age estimation.Subjects and methodsThis was a retrospective study that collected and analyzed orthopantomograms of 748 children (356 females and 392 males) aged 5–13 years. Data were randomly divided into training and test datasets in an 80–20% proportion for the ML algorithms. The procedure, starting with randomly creating new training and test datasets, was repeated 20 times. 7 permanent developing teeth on the left mandible (except wisdom teeth) were recorded using the Cameriere method. Then, the traditional Cameriere formula and three models (RF, SVM, and LR) were used to estimate the dental age. The age prediction accuracy was measured by five indicators: the coefficient of determination (R2), mean error (ME), root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE).ResultsThe research showed that the ML models have better accuracy than the traditional Cameriere formula. The ME, MAE, MSE, and RMSE values of the SVM model (0.004, 0.489, 0.392, and 0.625, respectively) and the RF model (− 0.004, 0.495, 0.389, and 0.623, respectively) were lower with the highest accuracy. In contrast, the ME, MAE, MSE and RMSE of the European Cameriere formula were 0.592, 0.846, 0.755, and 0.869, respectively, and those of the Chinese Cameriere formula were 0.748, 0.812, 0.890 and 0.943, respectively.ConclusionsCompared to the Cameriere formula, ML methods based on the Cameriere’s maturation stages were more accurate in estimating dental age. These results support the use of ML algorithms instead of the traditional Cameriere formula.
- Research Article
3
- 10.1177/03611981241245679
- May 16, 2024
- Transportation Research Record: Journal of the Transportation Research Board
The cone penetration test (CPT) is widely used in geotechnical engineering to assess soil properties. Traditional methods of interpreting CPT data and classifying soils have limitations and are time-consuming. Machine learning (ML) algorithms offer a data-driven approach to automate and improve soil classification based on CPT data. In this study, the applicability of ML techniques was investigated to measure the reliability of soil classification prediction using raw CPT data. A dataset comprising raw CPT data and corresponding soil classifications derived from the adjacent boreholes was prepared for training and testing the selected ML techniques. Five ML algorithms, namely logistic regression, the support vector machine, the random forest (RF), K -nearest neighbors (KNN), and extreme gradient boosting (XGBoost), were applied. The results showed that the RF algorithm outperformed other ML methods, achieving an F 1-score of 0.896. Comparing the performance of different algorithms, the RF consistently showed the best results, followed by XGBoost and KNN. These findings highlight the potential of ML algorithms, particularly the RF, in accurately predicting soil classification based on CPT data, thus improving the efficiency and reliability of geotechnical engineering applications.