Abstract
Molecular data form an important research tool in most branches of mycology. A non-trivial proportion of the public fungal DNA sequences are, however, compromised in terms of quality and reliability, contributing noise and bias to sequence-borne inferences such as phylogenetic analysis, diversity assessment, and barcoding. In this paper we discuss various aspects and pitfalls of sequence quality assessment. Based on our observations, we provide a set of guidelines to assist in manual quality management of newly generated, near-full-length (Sanger-derived) fungal ITS sequences and to some extent also sequences of shorter read lengths, other genes or markers, and groups of organisms. The guidelines are intentionally non-technical and do not require substantial bioinformatics skills or significant computational power. Despite their simple nature, we feel they would have caught the vast majority of the severely compromised ITS sequences in the public corpus. Our guidelines are nevertheless not infallible, and common sense and intuition remain important elements in the pursuit of compromised sequence data. The guidelines focus on basic sequence authenticity and reliability of the newly generated sequences, and the user may want to consider additional resources and steps to accomplish the best possible quality control. A discussion on the technical resources for further sequence quality management is therefore provided in the supplementary material.
Highlights
The inconspicuous and largely subterranean or endophytic nature of much of fungal life presents a challenge to mycology
Discriminatory yet assessed morphological characters are something of a rare commodity in mycology, and morphology alone often falls short of providing unequivocal species identification and delimitation
DNA sequences represent a key source of information in most branches of mycology, including systematics, taxonomy, and ecology (Stajich et al 2009), and the landmarks include the establishment of a phylogenetic backbone and a classification system for the fungal kingdom (Blackwell et al 2006; James et al 2006; Hibbett et al 2007)
Summary
The inconspicuous and largely subterranean or endophytic nature of much of fungal life presents a challenge to mycology. The guidelines are simple and straightforward to apply; substantial bioinformatics expertise is not required, and only on-line resources of the paste-and-click type are used Their simple nature notwithstanding, we believe that these guidelines would have caught the vast majority of the present severely compromised fungal ITS sequences in the public corpus, had they been available and applied at the time of data generation and accessioning. We would like to stress that the guidelines described here focus on basic sequence authenticity and reliability; they are certainly no panacea for sequence quality management Their purpose is to assist in pruning severely compromised entries from newly generated, nearly full-length (typically, but not exclusively, Sanger-derived) fungal ITS datasets before those sequences are put to scientific use. Establish that the sequences come from the intended gene or marker
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.