Abstract

UNITE (https://unite.ut.ee/) is a web-based database and sequence management environment for the molecular identification of fungi. It targets the formal fungal barcode—the nuclear ribosomal internal transcribed spacer (ITS) region—and offers all ∼1 000 000 public fungal ITS sequences for reference. These are clustered into ∼459 000 species hypotheses and assigned digital object identifiers (DOIs) to promote unambiguous reference across studies. In-house and web-based third-party sequence curation and annotation have resulted in more than 275 000 improvements to the data over the past 15 years. UNITE serves as a data provider for a range of metabarcoding software pipelines and regularly exchanges data with all major fungal sequence databases and other community resources. Recent improvements include redesigned handling of unclassifiable species hypotheses, integration with the taxonomic backbone of the Global Biodiversity Information Facility, and support for an unlimited number of parallel taxonomic classification systems.

Highlights

  • The fungal kingdom comprises an estimated 2.2–3.8 million species of heterotrophic eukaryotes, most of which are inconspicuous and substrate-dwelling [1]

  • The ∼600-base nuclear ribosomal internal transcribed spacer (ITS) region is the primary genetic marker for such pursuits [2], and more than 1 000 000 full-length, Sangerderived fungal ITS sequences are available for reference in the International Nucleotide Sequence Databases Collaboration (INSDC; 3)

  • Significant processing and annotation are necessary before the public sequences can be used for taxonomic annotation of newly generated sequence data, and the UNITE database for molecular identification of fungi (4; https://unite.ut.ee/) was launched in 2003 as a curated copy of the public fungal ITS sequences

Read more

Summary

INTRODUCTION

The fungal kingdom comprises an estimated 2.2–3.8 million species of heterotrophic eukaryotes, most of which are inconspicuous and substrate-dwelling [1]. Significant processing and annotation are necessary before the public sequences can be used for taxonomic annotation of newly generated sequence data, and the UNITE database for molecular identification of fungi (4; https://unite.ut.ee/) was launched in 2003 as a curated copy of the public fungal ITS sequences. UNITE regularly clusters all ITS sequences at several sequence similarity thresholds to obtain approximate species-level OTUs referred to as species hypotheses (SHs). All such SHs (458 797 as of August 2018) are assigned a unique digital object identifier (DOI) to allow stable, unambiguous reference across studies, even in the complete absence of meaningful taxonomic names. We detail the recent changes we have implemented in UNITE to meet the challenges posed by technological and conceptual advances in the mycological and molecular ecology communities

Sequence data and quality control
UNITE taxonomy
United States China Canada Japan Germany Australia Spain Estonia Finland Sweden
Database structure and adherence to metadata standards
UNITE WEBSITE
Species hypothesis system
Identification services
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call