Inferring Species Compositions of Complex Fungal Communities from Long- and Short-Read Sequence Data.

Yiheng Hu,Laszlo Irinyi,Abigail Graetz,John P Rathjen,Tavish Eenjes,Wieland Meyer,Benjamin Schwessinger,Minh Thuy Vi Hoang,Eric A Stone,B Gillian Turgeon

doi:10.1128/mbio.02444-21

Abstract

ABSTRACTThe kingdom Fungi is highly diverse in morphology and ecosystem function. Yet fungi are challenging to characterize as they can be difficult to culture and morphologically indistinct. Overall, their description and analysis lag far behind other microbes such as bacteria. Classification of species via high-throughput sequencing is increasingly becoming the norm for pathogen detection, microbiome studies, and environmental monitoring. With the rapid development of sequencing technologies, however, standardized procedures for taxonomic assignment of long sequence reads have not yet been well established. Focusing on nanopore sequencing technology, we compared classification and community composition analysis pipelines using shotgun and amplicon sequencing data generated from mock communities comprising 43 fungal species. We show that regardless of the sequencing methodology used, the highest accuracy of species identification was achieved by sequence alignment against a fungal-specific database. During the assessment of classification algorithms, we found that applying cutoffs to the query coverage of each read or contig significantly improved the classification accuracy and community composition analysis without major data loss. We also generated draft genome assemblies for three fungal species from nanopore data which were absent from genome databases. Our study improves sequence-based classification and estimation of relative sequence abundance using real fungal community data and provides a practical guide for the design of metagenomics analyses focusing on fungi.

Full Text