Abstract

Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.

Highlights

  • Machine learning (ML) is a form of artificial intelligence (AI) that gives machines the ability to learn through the use of algorithms and a training process [1] and is used in tandem with big data technologies and high-performance computing [2,3], which, together with information and communication technologies (ICTs) and the Internet of Things (IoT), deep learning, among others tools, have created new opportunities for dataintensive science

  • In decreasing order of ability to discriminate adequately based on the appropriate selection of descriptors, the algorithms were random forest (RF), RF, neuronal networks (NN), support vector machine (SVM) radial (SVMr) and linear (SVMl) with area under curve (AUC) accuracy classification values of 0.76, 0.64, 0.31, 0.21, and

  • Studies on Rubus subgenus Rubus highlight these descriptors in the determination of qualitative and quantitative variation among accessions [60]. These results suggest that many of the numerical descriptors prioritized by the RF method show the possibility of generating scale ratios, which is useful for comparing intervals, differences, and derivatives in absolute or dimensionless values [60]

Read more

Summary

Introduction

Machine learning (ML) is a form of artificial intelligence (AI) that gives machines the ability to learn through the use of algorithms and a training process [1] and is used in tandem with big data technologies and high-performance computing [2,3], which, together with information and communication technologies (ICTs) and the Internet of Things (IoT), deep learning, among others tools, have created new opportunities for dataintensive science These tools are being applied in multiple areas, including agriculture, as an emerging technology [2,4]. This implies avoiding the overfitting, select very complex models and data without standardization, improve the statists used, and data security, among others [2,9,10]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.