Machine Learning and Gaussian Mixture Model for Delineating Soil Cadmium Risk Zones
Machine Learning and Gaussian Mixture Model for Delineating Soil Cadmium Risk Zones
- Research Article
1
- 10.1038/s41598-022-20012-1
- Sep 30, 2022
- Scientific Reports
Deep neural networks (DNNs) have shown success in image classification, with high accuracy in recognition of everyday objects. Performance of DNNs has traditionally been measured assuming human accuracy is perfect. In specific problem domains, however, human accuracy is less than perfect and a comparison between humans and machine learning (ML) models can be performed. In recognising everyday objects, humans have the advantage of a lifetime of experience, whereas DNN models are trained only with a limited image dataset. We have tried to compare performance of human learners and two DNN models on an image dataset which is novel to both, i.e. histological images. We thus aim to eliminate the advantage of prior experience that humans have over DNN models in image classification. Ten classes of tissues were randomly selected from the undergraduate first year histology curriculum of a Medical School in North India. Two machine learning (ML) models were developed based on the VGG16 (VML) and Inception V2 (IML) DNNs, using transfer learning, to produce a 10-class classifier. One thousand (1000) images belonging to the ten classes (i.e. 100 images from each class) were split into training (700) and validation (300) sets. After training, the VML and IML model achieved 85.67 and 89% accuracy on the validation set, respectively. The training set was also circulated to medical students (MS) of the college for a week. An online quiz, consisting of a random selection of 100 images from the validation set, was conducted on students (after obtaining informed consent) who volunteered for the study. 66 students participated in the quiz, providing 6557 responses. In addition, we prepared a set of 10 images which belonged to different classes of tissue, not present in training set (i.e. out of training scope or OTS images). A second quiz was conducted on medical students with OTS images, and the ML models were also run on these OTS images. The overall accuracy of MS in the first quiz was 55.14%. The two ML models were also run on the first quiz questionnaire, producing accuracy between 91 and 93%. The ML models scored more than 80% of medical students. Analysis of confusion matrices of both ML models and all medical students showed dissimilar error profiles. However, when comparing the subset of students who achieved similar accuracy as the ML models, the error profile was also similar. Recognition of ‘stomach’ proved difficult for both humans and ML models. In 04 images in the first quiz set, both VML model and medical students produced highly equivocal responses. Within these images, a pattern of bias was uncovered–the tendency of medical students to misclassify ‘liver’ tissue. The ‘stomach’ class proved most difficult for both MS and VML, producing 34.84% of all errors of MS, and 41.17% of all errors of VML model; however, the IML model committed most errors in recognising the ‘skin’ class (27.5% of all errors). Analysis of the convolution layers of the DNN outlined features in the original image which might have led to misclassification by the VML model. In OTS images, however, the medical students produced better overall score than both ML models, i.e. they successfully recognised patterns of similarity between tissues and could generalise their training to a novel dataset. Our findings suggest that within the scope of training, ML models perform better than 80% medical students with a distinct error profile. However, students who have reached accuracy close to the ML models, tend to replicate the error profile as that of the ML models. This suggests a degree of similarity between how machines and humans extract features from an image. If asked to recognise images outside the scope of training, humans perform better at recognising patterns and likeness between tissues. This suggests that ‘training’ is not the same as ‘learning’, and humans can extend their pattern-based learning to different domains outside of the training set.
- Preprint Article
- 10.5194/ems2025-562
- Jul 16, 2025
Machine learning (ML) and deep learning (DL) models can play an important role when it comes to modelling complicated processes. Such capability is necessary for hydrological and climate-related applications. Generally, ML models utilize precipitation and temperature time series of a basin as input to develop a lumped rainfall-runoff model to simulate streamflow at the basin outlet. However, when it is divided into several sub-basins, Graph Neural Networks (GNN) can consider each sub-basin as a node and link them together using a connectivity matrix to account for spatial variations of hydroclimatic variables. In this study, GNN and various ML models with different types of architecture, ranging from neural networks, tree-based structure, and gradient boosting, were exploited for daily streamflow simulation over different case studies. For each case study, the basin was divided into a few sub-basins for which daily precipitation and temperature data were aggregated and used as input. For training GNN, the connection matrix of sub-basins was also used as input. Basically, 75% of historical records were utilized to train GNN and different ML models, e.g., artificial neural networks, support vector machine, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Category Boosting (CatBoost), while the rest was used for testing. Streamflow simulation was conducted with/without considering seasonality impact and lag times. The obtained results clearly demonstrate that considering seasonality and time lags can enhance accuracy of streamflow predictions based on Kling–Gupta efficiency (KGE). Furthermore, GNN with seasonality impact and time lags achieved promising results across different case studies with KGE>0.85 for training and KGE>0.59 for testing data, respectively. Among ML models, boosting models, e.g., LightGBM and XGBoost, performed slightly better than other ML models. for Finally, this comparative analysis provides valuable insights for ML/DL applications in climate change impact assessments.Acknowledgements: This research work was carried out as part of the TRANSCEND project with funding received from the European Union Horizon Europe Research and Innovation Programme under Grant Agreement No. 10108411.
- Research Article
29
- 10.1016/j.resourpol.2023.104216
- Oct 1, 2023
- Resources Policy
A novel deep-learning technique for forecasting oil price volatility using historical prices of five precious metals in context of green financing – A comparison of deep learning, machine learning, and statistical models
- Preprint Article
- 10.5194/egusphere-egu23-11636
- May 15, 2023
For recent years, Machine Learning (ML) models have been proven to be useful in solving problems of a wide variety of fields such as medical, economic, manufacturing, transportation, energy, education, etc. With increased interest in ML models and advances in sensor technologies, ML models are being widely applied even in civil engineering domain. ML model enables analysis of large amounts of data, automation, improved decision making and provides more accurate prediction. While several state-of-the-art reviews have been conducted in each sub-domain (e.g., geotechnical engineering, structural engineering) of civil engineering or its specific application problems (e.g., structural damage detection, water quality evaluation), little effort has been devoted to comprehensive review on ML models applied in civil engineering and compare them across sub-domains. A systematic, but domain-specific literature review framework should be employed to effectively classify and compare the models. To that end, this study proposes a novel review approach based on the hierarchical classification tree “D-A-M-I-E (Domain-Application problem-ML models-Input data-Example case)”. “D-A-M-I-E” classification tree classifies the ML studies in civil engineering based on the (1) domain of the civil engineering, (2) application problem, (3) applied ML models and (4) data used in the problem. Moreover, data used for the ML models in each application examples are examined based on the specific characteristic of the domain and the application problem. For comprehensive review, five different domains (structural engineering, geotechnical engineering, water engineering, transportation engineering and energy engineering) are considered and the ML application problem is divided into five different problems (prediction, classification, detection, generation, optimization). Based on the “D-A-M-I-E” classification tree, about 300 ML studies in civil engineering are reviewed. For each domain, analysis and comparison on following questions has been conducted: (1) which problems are mainly solved based on ML models, (2) which ML models are mainly applied in each domain and problem, (3) how advanced the ML models are and (4) what kind of data are used and what processing of data is performed for application of ML models. This paper assessed the expansion and applicability of the proposed methodology to other areas (e.g., Earth system modeling, climate science). Furthermore, based on the identification of research gaps of ML models in each domain, this paper provides future direction of ML in civil engineering based on the approaches of dealing data (e.g., collection, handling, storage, and transmission) and hopes to help application of ML models in other fields.
- Research Article
4
- 10.1007/s11356-024-35764-8
- Jan 1, 2025
- Environmental Science and Pollution Research
Human-induced global warming, primarily attributed to the rise in atmospheric CO2, poses a substantial risk to the survival of humanity. While most research focuses on predicting annual CO2 emissions, which are crucial for setting long-term emission mitigation targets, the precise prediction of daily CO2 emissions is equally vital for setting short-term targets. This study examines the performance of 14 models in predicting daily CO2 emissions data from 1/1/2022 to 30/9/2023 across the top four polluting regions (China, India, the USA, and the EU27&UK). The 14 models used in the study include four statistical models (ARMA, ARIMA, SARMA, and SARIMA), three machine learning models (support vector machine (SVM), random forest (RF), and gradient boosting (GB)), and seven deep learning models (artificial neural network (ANN), recurrent neural network variations such as gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional-LSTM (BILSTM), and three hybrid combinations of CNN-RNN). Performance evaluation employs four metrics (R2, MAE, RMSE, and MAPE). The results show that the machine learning (ML) and deep learning (DL) models, with higher R2 (0.714–0.932) and lower RMSE (0.480–0.247) values, respectively, outperformed the statistical model, which had R2 (− 0.060–0.719) and RMSE (1.695–0.537) values, in predicting daily CO2 emissions across all four regions. The performance of the ML and DL models was further enhanced by differencing, a technique that improves accuracy by ensuring stationarity and creating additional features and patterns from which the model can learn. Additionally, applying ensemble techniques such as bagging and voting improved the performance of the ML models by approximately 9.6%, whereas hybrid combinations of CNN-RNN enhanced the performance of the RNN models. In summary, the performance of both the ML and DL models was relatively similar. However, due to the high computational requirements associated with DL models, the recommended models for daily CO2 emission prediction are ML models using the ensemble technique of voting and bagging. This model can assist in accurately forecasting daily emissions, aiding authorities in setting targets for CO2 emission reduction.
- Research Article
3
- 10.13031/jnrae.15647
- Jan 1, 2023
- Journal of Natural Resources and Agricultural Ecosystems
Highlights Machine Learning (ML) models are identified, reviewed, and analyzed for HAB predictions. Data preprocessing is vital for efficient ML model development. ML models for toxin production and monitoring are limited. Abstract. Harmful algal blooms (HABs) are detrimental to livestock, humans, pets, the environment, and the global economy, which calls for a robust approach to their management. While process-based models can inform practitioners about HAB enabling conditions, they have inherent limitations in accurately predicting harmful algal blooms. To address these limitations, Machine Learning (ML) models can potentially leverage large volumes of IoT data to aid in near real-time predictions. ML models have evolved as efficient tools for understanding patterns and relationships between water quality parameters and HAB expansion. This review describes ML models currently used for predicting and forecasting HABs in freshwater ecosystems and presents model structures and their application for predicting algal parameters and related toxins. The review revealed that regression trees, random forest, Artificial Neural Network (ANN), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) are the most frequently used models for HABs monitoring. This review shows ML models' prowess in identifying significant variables influencing algal growth, HAB drivers, and multistep HAB prediction. Hybrid models also improve the prediction of algal-related parameters through improved optimization techniques and variable selection algorithms. While ML models often focus on algal biomass prediction, few studies apply ML models for toxin monitoring and prediction. This limitation can be associated with a lack of high-frequency toxin datasets for model development, and exploring this domain is encouraged. This review serves as a guide for policymakers and researchers to implement ML models for HAB prediction and reveals the potential of ML models for decision support and early prediction for HAB management. Keywords: Cyanobacteria, Freshwater, Harmful algal blooms, Machine learning, Water quality.
- Research Article
17
- 10.2196/47833
- Nov 20, 2023
- JMIR Medical Informatics
Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250.
- Research Article
102
- 10.1016/j.eswa.2020.114498
- Dec 24, 2020
- Expert Systems with Applications
Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling
- Research Article
6
- 10.3390/en14217049
- Oct 28, 2021
- Energies
Building an effective Machine Learning (ML) model for a data set is a difficult task involving various steps. One of the most important steps is to compare a substantial amount of generated ML models to find the optimal one for deployment. It is challenging to compare such models with a dynamic number of features. Comparison is more than only finding differences of ML model performance, as users are also interested in the relations between features and model performance such as feature importance for ML explanations. This paper proposes RadialNet Chart, a novel visualisation approach, to compare ML models trained with a different number of features of a given data set while revealing implicit dependent relations. In RadialNet Chart, ML models and features are represented by lines and arcs, respectively. These lines are generated effectively using a recursive function. The dependence of ML models with a dynamic number of features is encoded into the structure of visualisation, where ML models and their dependent features are directly revealed from related line connections. ML model performance information is encoded with colour and line width in RadialNet Chart. Taken together with the structure of visualisation, feature importance can be directly discerned in RadialNet Chart for ML explanations. Compared with other commonly used visualisation approaches, RadialNet Chart can help to simplify the ML model comparison process with different benefits such as the following: more efficient in terms of helping users to focus their attention to find visual elements of interest and easier to compare ML performance to find optimal ML model and discern important features visually and directly instead of through complex algorithmic calculations for ML explanations.
- Research Article
21
- 10.1175/jcli-d-21-0113.1
- Jun 8, 2021
- Journal of Climate
In this study, four machine learning (ML) models (gradient boost decision tree (GBDT), light gradient boosting machine (LightGBM), categorical boosting (CatBoost) and extreme gradient boosting (XGBoost)) are used to perform seasonal forecasts for non-monsoonal winter precipitation over the Eurasian continent (30-60°N, 30-105°E) (NWPE). The seasonal forecast results from a traditional linear regression (LR) model and two dynamic models are compared. The ML and LR models are trained using the data for the period of 1979-2010, and then, these empirical models are used to perform the seasonal forecast of NWPE for 2011-2018. Our results show that the four ML models have reasonable seasonal forecast skills for the NWPE and clearly outperform the LR model. The ML models and the dynamic models have skillful forecasts for the NWPE over different regions. The ensemble means of the forecasts including the ML models and dynamic models show higher forecast skill for the NWEP than the ensemble mean of the dynamic-only models. The forecast skill of the ML models mainly benefits from a skillful forecast of the third empirical orthogonal function (EOF) mode (EOF3) of the NWPE, which has a good and consistent prediction among the ML models. Our results also illustrate that the sea ice over the Arctic in the previous autumn is the most important predictor in the ML models in forecasting the NWPE. This study suggests that ML models may be useful tools to help improve seasonal forecasts of the NWPE.
- Research Article
4
- 10.1016/j.jclepro.2024.143166
- Jul 15, 2024
- Journal of Cleaner Production
Evaluating external generalizability of machine learning models for recycled aggregate concrete property prediction
- Research Article
5
- 10.3390/cancers14051121
- Feb 22, 2022
- Cancers
Simple SummaryEndoscopic resection (ER) is a treatment option for clinically T1a early gastric cancer (EGC) without suspicion of lymph node metastasis (LNM). In patients with non-curative resection after ER, additional surgery is recommended owing to the LNM risk. However, of those patients treated with additional surgery after ER, the actual rate of LNM was about 5–10%; that is, the other patients underwent unnecessary surgeries. Therefore, it is crucial to estimate LNM risk in EGC patients to determine additional management after ER. We derived a machine learning (ML) model to stratify the LNM risk in EGC patients and validate its performance. The constructed ML model, which showed good performance with an area under the receiver operating characteristic of 0.85 or higher, could stratify LNM risk into very low (<1%), low (<3%), intermediate (<7%), and high (≥7%) risk categories. These findings suggest that the ML model can stratify the LNM risk in EGC patients.Stratification of the risk of lymph node metastasis (LNM) in patients with non-curative resection after endoscopic resection (ER) for early gastric cancer (EGC) is crucial in determining additional treatment strategies and preventing unnecessary surgery. Hence, we developed a machine learning (ML) model and validated its performance for the stratification of LNM risk in patients with EGC. We enrolled patients who underwent primary surgery or additional surgery after ER for EGC between May 2005 and March 2021. Additionally, patients who underwent ER alone for EGC between May 2005 and March 2016 and were followed up for at least 5 years were included. The ML model was built based on a development set (70%) using logistic regression, random forest (RF), and support vector machine (SVM) analyses and assessed in a validation set (30%). In the validation set, LNM was found in 337 of 4428 patients (7.6%). Among the total patients, the area under the receiver operating characteristic (AUROC) for predicting LNM risk was 0.86 in the logistic regression, 0.85 in RF, and 0.86 in SVM analyses; in patients with initial ER, AUROC for predicting LNM risk was 0.90 in the logistic regression, 0.88 in RF, and 0.89 in SVM analyses. The ML model could stratify the LNM risk into very low (<1%), low (<3%), intermediate (<7%), and high (≥7%) risk categories, which was comparable with actual LNM rates. We demonstrate that the ML model can be used to identify LNM risk. However, this tool requires further validation in EGC patients with non-curative resection after ER for actual application.
- Research Article
29
- 10.3389/fcvm.2022.812276
- Apr 6, 2022
- Frontiers in Cardiovascular Medicine
ObjectiveTo compare the performance, clinical feasibility, and reliability of statistical and machine learning (ML) models in predicting heart failure (HF) events.BackgroundAlthough ML models have been proposed to revolutionize medicine, their promise in predicting HF events has not been investigated in detail.MethodsA systematic search was performed on Medline, Web of Science, and IEEE Xplore for studies published between January 1, 2011 to July 14, 2021 that developed or validated at least one statistical or ML model that could predict all-cause mortality or all-cause readmission of HF patients. Prediction Model Risk of Bias Assessment Tool was used to assess the risk of bias, and random effect model was used to evaluate the pooled c-statistics of included models.ResultTwo-hundred and two statistical model studies and 78 ML model studies were included from the retrieved papers. The pooled c-index of statistical models in predicting all-cause mortality, ML models in predicting all-cause mortality, statistical models in predicting all-cause readmission, ML models in predicting all-cause readmission were 0.733 (95% confidence interval 0.724–0.742), 0.777 (0.752–0.803), 0.678 (0.651–0.706), and 0.660 (0.633–0.686), respectively, indicating that ML models did not show consistent superiority compared to statistical models. The head-to-head comparison revealed similar results. Meanwhile, the immoderate use of predictors limited the feasibility of ML models. The risk of bias analysis indicated that ML models' technical pitfalls were more serious than statistical models'. Furthermore, the efficacy of ML models among different HF subgroups is still unclear.ConclusionsML models did not achieve a significant advantage in predicting events, and their clinical feasibility and reliability were worse.
- Preprint Article
- 10.5194/egusphere-egu22-8321
- Mar 28, 2022
&lt;p&gt;The consequences of ever-increasing human interference with freshwater systems, e.g., through land-use and climate changes, are already felt in many regions of the world, e.g., by shifts in freshwater availability and partitioning between green (evapotranspiration) and blue (runoff) water fluxes around the world. In this study, we have developed a machine learning (ML) model for the possible prediction of green-blue water flux partitioning (WFP) under different climate, land-use, and other landscape and hydrological catchment conditions around the world. ML models have shown relatively high predictive performance compared to more traditional modelling methods for several tasks in geosciences. However, ML is also rightly criticized for providing theory-free &amp;#8220;black-box&amp;#8221; models that may fail in predictions under forthcoming non-stationary conditions. We here address the ML model interpretability gap using Shapley values, an explainable artificial intelligence technique. We also assess ML model predictability using a dissimilarity index (DI). For ML model training and testing, we use different parts of a total database compiled for 3482 hydrological catchments with available data for daily runoff over at least 25 years. The target variable of the ML model is the blue-water partitioning ratio between average runoff and average precipitation (and the complementary, water-balance determined green water partitioning ratio) for each catchment. The predictor variables are hydro-climatic, land-cover/use, and other catchment indices derived from precipitation and temperature time series, land cover maps, and topography data. As a basis for the ML modelling, we also investigate and quantify (through data averaging over moving sub-periods of different time lengths) a minimum temporal aggregation scale for water flux averaging (referred to as the flux equilibration time, T&lt;sub&gt;eq&lt;/sub&gt;) required to reach a stable temporal average runoff (and evapotranspiration) fraction of precipitation in each catchment; for 99% of catchments, T&lt;sub&gt;eq&lt;/sub&gt; is found to be &amp;#8804;2 years, with longer T&lt;sub&gt;eq &lt;/sub&gt;emerging for catchments estimated to have higher ratio R&lt;sub&gt;gw&lt;/sub&gt;/R&lt;sub&gt;avg&lt;/sub&gt;, i.e., higher groundwater flow contribution (R&lt;sub&gt;gw&lt;/sub&gt;) to total average runoff (R&lt;sub&gt;avg&lt;/sub&gt;). The cubist model used for the ML modelling yields a Kling-Gupta efficiency of 0.86, while the Shapley values analysis indicates mean annual precipitation and temperature as the most important variables in determining the WFP, followed by average slope in each catchment. A DI threshold is further used to label new data points as inside or outside the ML model area of applicability (AoA). Comparison between test data points outside and inside the AoA reveals which catchment characteristics are mostly responsible for ML model loss of predictability. Predictability is lower for catchments with: larger T&lt;sub&gt;eq&lt;/sub&gt; and R&lt;sub&gt;gw&lt;/sub&gt;/R&lt;sub&gt;avg&lt;/sub&gt;; higher phase lag between peak precipitation and peak temperature over the year; lower forest and agricultural land fractions; and aridity index much higher or much lower than 1 (implying major water or energy limitation, respectively). Identifying such predictability limits is crucial for understanding, and facilitating user awareness of the applicability and forecasting ability of such data-driven ML modelling under different prevailing and changing future hydro-climatic, land-use, and groundwater conditions.&lt;/p&gt;
- Research Article
1
- 10.62487/yyx99243
- Jan 27, 2024
- Web3 Journal: ML in Health Science
Aim: The majority of machine learning (ML) models in healthcare are built on retrospective data, much of which is collected without explicit patient consent for use in artificial intelligence (AI) and ML applications. The primary aim of this study was to evaluate whether clinicians and scientific researchers themselves consent to provide their own data for the training of ML models. Materials and Methods: This survey was conducted through an anonymous online survey, utilizing platforms such as Telegram, LinkedIn, and Viber. The target audience comprised specific international groups, primarily Russian, German, and English-speaking, of clinicians and scientific researchers. These participants ranged in their levels of expertise and experience, from beginners to veterans. The survey centered on a singular, pivotal question: “Do You Consent to the Use of Your Biological and Private Data for Training Machine Learning and AI Models?” Respondents had the option to choose from three responses: “Yes” and “No”. Results: The survey was conducted in January 2024. A total of 119 unique and verified individuals participated in the survey. The results revealed that only 50% of respondents (63 persons) expressed consent to provide their own data for the training of ML and AI models. Conclusion: In the development of ML and AI models, particularly open-source ones, it is crucial to ascertain whether participants are willing to provide their private data. While ML algorithms can transform the nature of data, it is important to remember that the primary owner of this data is the individual. Our findings show that in 50% of the cases, even participants from scientific research and clinical backgrounds – individuals typically accountable for ensuring data quality in AI and ML model development – do not consent to the use of their data in AI and ML settings. This highlights the need for more stringent consent processes and ethical considerations in the utilization of personal data in AI and ML research.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.