Abstract

BackgroundThe rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage.ResultsWe have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli.ConclusionThe statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Highlights

  • The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife

  • Inference of Gene Families From Trace Archives we present the mathematical basis of the assembly method we have developed and provide a concise algorithmic implementation

  • This seed gene would be an ortholog of the genes one is attempting to infer; one collects S using similarity searching on the complete trace archive for the species of interest

Read more

Summary

Introduction

The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Novel human pathogens appear at a continually increasing rate. The majority of these agents are zoonotic, and have their origins in wildlife [1]. SARS-like Coronaviruses have been found in bats [12,13] and so far provide the closest link to the agent of human SARS

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.