Labeling Errors Research Articles

Cacao (Theobroma cacao L.) is a tropical tree species belonging to the Malvaceae, which originated in the lowland rainforests of the Amazon. It is a major agricultural commodity, which contributes towards the Gross Domestic Product of West African countries, where it accounts for about 70% of the world’s production. Understanding the genetic diversity of genetic resources in a country, especially for an introduced crop such as cacao, is crucial to their management and effective utilization. However, very little is known about the genetic structure of the cacao germplasm from Sierra Leone and Togo based on molecular information. We assembled cacao germplasm accessions (235 from Sierra Leone and 141 from Togo) from different seed gardens and farmers’ fields across the cacao-producing states/regions of these countries for genetic diversity and population structure studies based on single nucleotide polymorphism (SNP) markers using 20 highly informative and reproducible KASP–SNPs markers. Genetic diversity among these accessions was assessed with three complementary clustering methods, including model-based population structure, discriminant analysis of principal components (DAPC), and phylogenetic trees. STRUCTURE and DAPC exhibited some consistency in the allocation of accessions into subpopulations or groups, although some discrepancies in their groupings were noted. Hierarchical clustering analysis grouped all the individuals into two major groups, as well as several sub-clusters. We also conducted a network analysis to elucidate genetic relationships among cacao accessions from Sierra Leone and Togo. Analysis of molecular variance (AMOVA) revealed high genetic diversity (86%) within accessions. A high rate of mislabeling/duplicate genotype names was revealed in both countries, which may be attributed to errors from the sources of introduction, labeling errors, and lost labels. This preliminary study demonstrates the use of KASP–SNPs for fingerprinting that can help identify duplicate/mislabeled accessions and provide strong evidence for improving accuracy and efficiency in cacao germplasm management as well as the distribution of correct materials to farmers.

Read full abstract

Background: Root system architecture (RSA) is of growing interest in implementing plant improvements with belowground root traits. Modern computing technology applied to images offers new pathways forward to plant trait improvements and selection through RSA analysis (using images to discern/classify root types and traits). However, a major stumbling block to image-based RSA phenotyping is image label noise, which reduces the accuracies of models that take images as direct inputs. To address the label noise problem, this study utilized an artificial intelligence model capable of classifying the RSA of alfalfa (Medicago sativa L.) directly from images and coupled it with downstream label improvement methods. Images were compared with different model outputs with manual root classifications, and confident machine learning (CL) and reactive machine learning (RL) methods were tested to minimize the effects of subjective labeling to improve labeling and prediction accuracies. Results: The CL algorithm modestly improved the Random Forest model's overall prediction accuracy of the Minnesota dataset (1%) while larger gains in accuracy were observed with the ResNet-18 model results. The ResNet-18 cross-population prediction accuracy was improved (~8% to 13%) with CL compared to the original/preprocessed datasets. Training and testing data combinations with the highest accuracies (86%) resulted from the CL- and/or RL-corrected datasets for predicting taproot RSAs. Similarly, the highest accuracies achieved for the intermediate RSA class resulted from corrected data combinations. The highest overall accuracy (~75%) using the ResNet-18 model involved CL on a pooled dataset containing images from both sample locations. Conclusions: ResNet-18 DNN prediction accuracies of alfalfa RSA image labels are increased when CL and RL are employed. By increasing the dataset to reduce overfitting while concurrently finding and correcting image label errors, it is demonstrated here that accuracy increases by as much as ~11% to 13% can be achieved with semi-automated, computer-assisted preprocessing and data cleaning (CL/RL).

Read full abstract

Labeling Errors Research Articles

Related Topics

Articles published on Labeling Errors

Dataset preparation for CROHME 2019 for training a neural network

Learning and semiautomatic intention labeling for classification models: a COVID-19 dialog attendance study for chatbots

Genetic Diversity and Population Structure of Cacao (Theobroma cacao L.) Germplasm from Sierra Leone and Togo Based on KASP–SNP Genotyping

A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

EEG-based emotional valence and emotion regulation classification: a data-centric and explainable approach

A-179 Improving specimen labeling errors in the pediatric emergency department at a tertiary care hospital

Automatic Road Crack Detection Using Convolutional Neural Network Based on Semi-Supervised Learning

Performance of the ChatGPT large language model for decision support in community pharmacy.

Transfusion sample mislabelling and wrong blood in tube in the UK: Insights from the national comparative audits of blood transfusion in 2012 and 2022.

Phenotyping Alfalfa (Medicago sativa L.) Root Structure Architecture via Integrating Confident Machine Learning with ResNet-18.

Better performance of deep learning pulmonary nodule detection using chest radiography with pixel level labels in reference to computed tomography: data quality matters

Addressing label noise for electronic health records: insights from computer vision for tabular data

Situational diversity in video person re-identification: introducing MSA-BUPT dataset

Enhancing the automatic facies classification of Brazilian presalt acoustic image logs with SwinV2-Unet: Leveraging transfer learning and confident learning

228: Hit the right vertebra on CBCT: A robust safeguard against labeling errors in a “one-stop-shop”

Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection Using Eight Million Samples Labeled With Imprecise Arrhythmia Alarms.

Identifying systems factors contributing to adverse events in maternal care using incident reports

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning

Enhancing ground classification models for TBM tunneling: Detecting label errors in datasets

Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Labeling Errors Research Articles

Related Topics

Articles published on Labeling Errors

Dataset preparation for CROHME 2019 for training a neural network

Learning and semiautomatic intention labeling for classification models: a COVID-19 dialog attendance study for chatbots

Genetic Diversity and Population Structure of Cacao (Theobroma cacao L.) Germplasm from Sierra Leone and Togo Based on KASP–SNP Genotyping

A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

EEG-based emotional valence and emotion regulation classification: a data-centric and explainable approach

A-179 Improving specimen labeling errors in the pediatric emergency department at a tertiary care hospital

Automatic Road Crack Detection Using Convolutional Neural Network Based on Semi-Supervised Learning

Performance of the ChatGPT large language model for decision support in community pharmacy.

Transfusion sample mislabelling and wrong blood in tube in the UK: Insights from the national comparative audits of blood transfusion in 2012 and 2022.

Phenotyping Alfalfa (Medicago sativa L.) Root Structure Architecture via Integrating Confident Machine Learning with ResNet-18.

Better performance of deep learning pulmonary nodule detection using chest radiography with pixel level labels in reference to computed tomography: data quality matters

Addressing label noise for electronic health records: insights from computer vision for tabular data

Situational diversity in video person re-identification: introducing MSA-BUPT dataset

Enhancing the automatic facies classification of Brazilian presalt acoustic image logs with SwinV2-Unet: Leveraging transfer learning and confident learning

228: Hit the right vertebra on CBCT: A robust safeguard against labeling errors in a “one-stop-shop”

Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection Using Eight Million Samples Labeled With Imprecise Arrhythmia Alarms.

Identifying systems factors contributing to adverse events in maternal care using incident reports

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning

Enhancing ground classification models for TBM tunneling: Detecting label errors in datasets

Mitigating Label Bias in Machine Learning: Fairness through Confident Learning