Abstract

BackgroundDNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification.MethodsIn order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset.ResultsDue to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species.ConclusionsThis novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-016-0120-6) contains supplementary material, which is available to authorized users.

Highlights

  • DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome

  • Seven chloroplast loci have been tested for plant species identification by The Consortium for the Barcode of Life (CBOL) Plant Working Group, where the suitability of matK and rbcL loci as a barcode was showed [12]

  • Retrieval of dataset for signature generation We referred the plant species barcode dataset reported by the CBOL Plant Working Group and Janzen DH for the present study, which is available at GenBank Nucleotide Database [12]

Read more

Summary

Introduction

DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. DNA barcoding is considered as a universal species identification method for plants It mainly involves discrimination of species through standardized molecular marker gene and is gaining support from the taxonomists as well. In plant DNA barcoding, there has been extensive debate about the locus choice; several regions of the genome (trnH, psbA, rpoC1, rpoB, atpF, atpH, psbK, and psbI) were referred as candidate markers with different discrimination success. The two loci ribulose-1,5-bisphosphate Carboxylase (rbcL) and maturase K (matK) gene regions are widely used in plant barcoding studies for phylogenetic analyses or species identification [10, 11]. Techen et al [14] mentioned that the matK region was preferred as a barcode candidate because of high evolutionary rate, low transition/transversion rate, and inter-specific divergence

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call