Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics.

Aditya A Shastri,Yann Busnel,Milind B Ratnaparkhe,Kapil Ahuja

doi:10.7717/peerj.11927

Aditya A Shastri, Yann Busnel + Show 2 more

Open Access

PDF Available

https://doi.org/10.7717/peerj.11927

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.

Highlights

Genetic diversity has been an important foundation of plant breeding from the inception of agriculture since it helps develop new plants to meet the growing food demand globally
The hypothesis related to this is as follows: for a particular sampling technique, if the estimate of the population total using the samples is close to the actual population total, that sampling technique is considered good in an absolute sense
We demonstrate that use of sampling with modified Spectral Clustering (SC) does not deteriorate the quality of clustering

Summary

Introduction

Genetic diversity has been an important foundation of plant breeding from the inception of agriculture since it helps develop new plants to meet the growing food demand globally. The breeding process is a complex combination of multiple stages [1]. The first stage involves discovery of the native characteristics where the selection of diverse parent donors is of paramount importance [2]. One way plant genetic diversity can be studied is by using their phenotypic characteristics (physical characteristics). This kind of analysis can be relatively done because a sufficiently large amount of data is available from different geographical areas. In the phenotypic context, which is our first focus, a few characteristics that play an important role are Days to 50% Flowering, Days to Maturity, Plant Height, 100 Seed Weight, Seed Yield Per Plant, Number of Branches Per Plant, etc

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Sep 7, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Comparison of spectral clustering, K-clustering and hierarchical clustering on e-nose datasets: Application to the recognition of material freshness, adulteration levels and pretreatment approaches for tomato juices
Xuezhen Hong ... Guande Qi
Chemometrics and Intelligent Laboratory Systems | VOL. 133
Xuezhen Hong, et. al.Xuezhen Hong ... Guande Qi
08 Feb 2014
Chemometrics and Intelligent Laboratory Systems | VOL. 133

SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles
Zhiwen Yu ... Hau-San Wong
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 9
Zhiwen Yu, et. al.Zhiwen Yu ... Hau-San Wong
01 Nov 2012
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 9

An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method
Abeer A Aljohani ... Daphne Teck Ching Lai
-
Abeer A Aljohani, et. al.Abeer A Aljohani ... Daphne Teck Ching Lai
24 Aug 2019
24 Aug 2019

Simulated annealing spectral clustering algorithm for image segmentation
Yifang Yang ... Yuping Wang
Journal of Systems Engineering and Electronics | VOL. 25
Yifang Yang, et. al.Yifang Yang ... Yuping Wang
01 Jun 2014
Journal of Systems Engineering and Electronics | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PeerJ