Abstract

Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers’ reliability. Our analysis suggests that from the unsupervised method, TaxonDNA’s All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants.

Highlights

  • The curated dataset was labelled as GBMA; it consisted of 1118 sequences belonging to the rbcL marker, representing 414 species, and 277 sequences belonging to the matK marker, representing 113 species [https://dx.doi.org/

  • Besides the GBMA dataset, the standardized dataset (FINBOL) was prepared from the DS-FBPL dataset available at the Barcode of Life Data (BOLD) Systems to test the robustness of the methods and markers employed

  • A standard curated dataset (FINBOL) was obtained from BOLD Systems and analyzed side-by-side to understand the performance of methods and markers employed

Read more

Summary

Introduction

Saudi Arabia is the largest country (830,000 m2 ) that covers almost four-fifths of the Arabian Peninsula [1], whereas Bahrain is the smallest country (295.5 m2 ). In the case of plant species diversity estimates, there are more than 3500 native plants in the Arabian Peninsula [2]. Iraq exhibits the most diverse flora with more than 3300 species [3], followed by Yemen (number of species (n) = 2838) [4], Jordan (n = +2500) [5], Saudi Arabia (n = 2282) [6], Oman (n = 1239) [7], UAE (n = 731) [8], Kuwait (n = 407) [9], Qatar (n = 400) [10] and Bahrain (n = 307) [11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call