Abstract
Fungal genome sequencing data represent an enormous pool of information for enzyme discovery. Here, we report a new approach to identify and quantitatively compare biomass-degrading capacity and diversity of fungal genomes via integrated function-family annotation of carbohydrate-active enzymes (CAZymes) encoded by the genomes. Based on analyses of 1932 fungal genomes the most potent hotspots of fungal biomass processing CAZymes are identified and ranked according to substrate degradation capacity. The analysis is achieved by a new bioinformatics approach, Conserved Unique Peptide Patterns (CUPP), providing for CAZyme-family annotation and robust prediction of molecular function followed by conversion of the CUPP output to lists of integrated “Function;Family” (e.g., EC 3.2.1.4;GH5) enzyme observations. An EC-function found in several protein families counts as different observations. Summing up such observations allows for ranking of all analyzed genome sequenced fungal species according to richness in CAZyme function diversity and degrading capacity. Identifying fungal CAZyme hotspots provides for identification of fungal species richest in cellulolytic, xylanolytic, pectinolytic, and lignin modifying enzymes. The fungal enzyme hotspots are found in fungi having very different lifestyle, ecology, physiology and substrate/host affinity. Surprisingly, most CAZyme hotspots are found in enzymatically understudied and unexploited species. In contrast, the most well-known fungal enzyme producers, from where many industrially exploited enzymes are derived, are ranking unexpectedly low. The results contribute to elucidating the evolution of fungal substrate-digestive CAZyme profiles, ecophysiology, and habitat adaptations, and expand the knowledge base for novel and improved biomass resource utilization.
Highlights
We use a new bioinformatics-based approach to mine the immense amounts of available fungal genomic data to identify fungi whose genomes harbor genes that encode high numbers of biomass processing enzymes associated with conversion of different groups of the key biomass substrates, cellulose, xylan, pectin, and lignin
Selection, and Filtering fungal genomic assemblies available in the NCBI GenBank were downloaded, and statistics and filtering were assessed in the following way: For each fungal species, the assembly having the greatest sequencing coverage times the Contig N-50 value [15] was selected as representative of the species
The original validation of the Conserved Unique Peptide Patterns (CUPP) method showed that the F-score for CUPP annotation was 0.97–0.98 whereas the F-score of dbCAN was 0.94–0.97 [11,12]
Summary
We use a new bioinformatics-based approach to mine the immense amounts of available fungal genomic data to identify fungi whose genomes harbor genes that encode high numbers of biomass processing enzymes associated with conversion of different groups of the key biomass substrates, cellulose, xylan, pectin, and lignin. Our aim is to contribute a new approach for efficiently using genome data to identify fungal species that feature exceptionally interesting CAZyme gene pools. This approach provides a novel basis for selecting new enzyme systems to develop more efficient biomass conversion and processing strategies for upgrading of a wide spectrum of biomass resources, side-streams and wastes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.