Abstract

We extend the self-organizing approach for annotation of a bacterial genome to analyzing the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven ‘phases’, among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or ‘codon usages’. A set of codon usages can be used to update the phase assignment and vice versa. After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories of genomes. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome.

Highlights

  • The majority of microbes in our body resides in the gut

  • After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome

  • A primitive survey shows that the tables of E. coli, B. vulgatus and F

Read more

Summary

INTRODUCTION

The majority of microbes in our body resides in the gut. They are crucial for human life. In this paper we extend the self-organizing approach for annotation of a bacterial genome to the analysis of the raw data of the gut meta-genome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns each segment to one of the seven ‘phases’, among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome. Shifting all the windows by three nucleotides, we again repeat the iteration to obtain a new convergent phase assignment. The distance between the noncoding and coding table estimated from the annotated E. coli genome is 9.24, while the distance between the predicted coding distribution and the one extracted from the known annotation is less than 0.15

SELF-ORGANIZING APPROACH FOR THE HUMAN GUT METAGENOME
Single-category Model
Two-category Model
Three-category 1n3c Model
Annotating Samples with Trained Triplet Tables
D N DN q Diabetic q Normal
Findings
CONCLUDING REMARKS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call