Abstract

A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research.

Highlights

  • A classification and naming system for life on earth that is accepted and used by all members of the scientific community is a prerequisite for biological research

  • Changes in species descriptions and/or names represent a challenge for researchers, they can have dangerous implications for medical diagnostics when they concern pathogenic organisms. Such changes in species descriptions can lead to miscommunication between medical personnel about the identity of pathogens, thereby compromising the application of the most appropriate treatment. To address these challenges in today’s world where hundreds or thousands of new genome sequences are obtained daily but in the absence of any means to classify and name these organisms at a similar speed, we propose the introduction of informative genome similarity-based codes that can be assigned automatically to every single genome-sequenced organism completely independently of current classification and nomenclature

  • Since we propose to assign codes to organisms sequentially in the order in which their genomes are submitted for code assignment, it was important to determine the effect of the order of code assignment on the similarity of codes between organisms

Read more

Summary

Introduction

A classification and naming system for life on earth that is accepted and used by all members of the scientific community is a prerequisite for biological research. Such changes in species descriptions can lead to miscommunication between medical personnel about the identity of pathogens, thereby compromising the application of the most appropriate treatment To address these challenges in today’s world where hundreds or thousands of new genome sequences are obtained daily but in the absence of any means to classify and name these organisms at a similar speed, we propose the introduction of informative genome similarity-based codes that can be assigned automatically to every single genome-sequenced organism completely independently of current classification and nomenclature. A phylogenetic approach is not advantageous over a simple genome similarity-based approach and could not provide unique and stable identifiers for individual organisms that can be assigned as soon as a new genome sequence becomes available. It would be impossible to assign phylogeny-based codes one genome at the time and such codes would need to be revised whenever the addition of a new genome sequence changes the reconstructed evolutionary history of a group of organisms. Comparing codes could make it very easy for people to determine how closely related they are to each other and compare each other’s ancestry

Conclusions
Findings
Materials and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call