Abstract

BackgroundIdentification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species.ResultsA Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi.ConclusionsAn online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.

Highlights

  • Identification of unknown fungal species aids to the conservation of fungal diversity

  • Considering the importance of barcoding in the preservation of species diversity as well as for other applications, the Consortium for barcode of life (CBOL) has been continuously emphasizing on Mycofier, naïve Bayes classifier coupled with k-mer (k=5) features was adopted for identification of fungi at genus label [18]

  • Though concerted efforts have been put for the development of above mentioned tools and techniques that have advanced our knowledge for species identification using DNA barcode, still there is a room for further improvement

Read more

Summary

Introduction

Identification of unknown fungal species aids to the conservation of fungal diversity. Prediction of unknown fungal specimens and conservation of their genomic resources are vital for studying and preserving fungal diversity [2]. Identification of specimens that lacked morphological character is often difficult [3] In this direction, molecular technique like DNA barcoding [4] has been successfully employed in the recent. Though concerted efforts have been put for the development of above mentioned tools and techniques that have advanced our knowledge for species identification using DNA barcode, still there is a room for further improvement. The supervised machine learning techniques such as naïve Bayes classifier, kNN, Bayesisn regression model have been successfully employed for taxonomy assignments of fungal species, as evidenced from the above mentioned studies. We have proposed a supervised learning-based prediction model for identification of fungal species, by analyzing their barcode sequences. We believe that the developed approach will supplement the existing tools and techniques for species identification using DNA barcode

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call