Background: Root system architecture (RSA) is of growing interest in implementing plant improvements with belowground root traits. Modern computing technology applied to images offers new pathways forward to plant trait improvements and selection through RSA analysis (using images to discern/classify root types and traits). However, a major stumbling block to image-based RSA phenotyping is image label noise, which reduces the accuracies of models that take images as direct inputs. To address the label noise problem, this study utilized an artificial intelligence model capable of classifying the RSA of alfalfa (Medicago sativa L.) directly from images and coupled it with downstream label improvement methods. Images were compared with different model outputs with manual root classifications, and confident machine learning (CL) and reactive machine learning (RL) methods were tested to minimize the effects of subjective labeling to improve labeling and prediction accuracies. Results: The CL algorithm modestly improved the Random Forest model's overall prediction accuracy of the Minnesota dataset (1%) while larger gains in accuracy were observed with the ResNet-18 model results. The ResNet-18 cross-population prediction accuracy was improved (~8% to 13%) with CL compared to the original/preprocessed datasets. Training and testing data combinations with the highest accuracies (86%) resulted from the CL- and/or RL-corrected datasets for predicting taproot RSAs. Similarly, the highest accuracies achieved for the intermediate RSA class resulted from corrected data combinations. The highest overall accuracy (~75%) using the ResNet-18 model involved CL on a pooled dataset containing images from both sample locations. Conclusions: ResNet-18 DNN prediction accuracies of alfalfa RSA image labels are increased when CL and RL are employed. By increasing the dataset to reduce overfitting while concurrently finding and correcting image label errors, it is demonstrated here that accuracy increases by as much as ~11% to 13% can be achieved with semi-automated, computer-assisted preprocessing and data cleaning (CL/RL).
Read full abstract