Abstract

AbstractWe have annotated the European outbreak E. coli EHEC genome sequenced by BGI (6-2-2011) and assembled with MIRA by Nick Loman (6-2-2011 ). Our system BG7, Bacterial Genome annotation of Era7 Bioinformatics, predicts ORFs and annotates them based on fragments of similarity with Uniprot proteins. We have predicted 6327 genes, 6156 encoding proteins y 171 corresponding to ribosomal and tRNA. Based on the preliminary results of our semi-automated method of annotation we have selected some predicted proteins with potential implications in pathogenicity and virulence.There are 33 predicted genes annotated as toxins and we have found three putative hemolysins: Hemolysin E, a putative hemolysin expression modulating protein and a channel protein, hemolysin III family. We have found 31 predicted genes that could be related to specific antibiotic resistance: beta-lactamic, aminoglycoside, macrolide, polymyxin, tetracycline, fosfomycin and deoxycholate, novobiocin, chloramphenicol, bicyclomycin, norfloxacin and enoxacin and 6-mercaptopurine. This strain is rich in adhesion, secretion systems, pathogenicity and virulence related proteins. It seems to have a restriction-modification system, many proteins involved in Fe transport and utilization (siderophores as aerobactin and enterobactin), lysozyme, one inhibitor of pancreatic serine proteases, proteins involved in anaerobic respiration, antimicrobial peptides, and proteins involved in quorum sensing and biofilm formation that could confer competitive advantage to this strain.

Highlights

  • Our system BG7 (Bacterial Genome annotation of Era[7] Bioinformatics, https://registration.hinxton.wellcome.ac.uk/display_info.asp?id=227, http://www.slideshare.net/marina_manrique/bg7-a-new-system-for-bacterialgenome-annotation-designed-for-ngs-data ) predicts ORFs and annotates them based on fragments of similarity with Uniprot proteins

  • In contrast to other annotation pipelines where finding ORFs is the first step followed by the annotation one, BG7 system first searches for protein similarity and defines the ORF searching for start and stop signals

  • It is designed for annotating prokaryotic genomes obtained with NGS data since it handles the principal errors of these technologies: false indels in homopolymer regions and substitutions

Read more

Summary

RESULTS

We have predicted 6327 genes, 6156 encoding proteins y 171 corresponding to ribosomal and tRNA. 1326 out of the 6156 protein encoding genes have canonical start and stop codon and havent frame-shifts neither intragenic stop codons. 2479 protein encoding genes (out of the 6156 predicted) include some frameshift or some intragenic stop codon in their sequences, probably caused by inherent technology errors. Our system is tolerant to errors of massive sequencing technologies and it has been able to detect a rich set of genes even with very preliminary sequencing results. Some of the proteins detected are fragmented and some of them could appear as two different predicted genes if they are in different contigs. We have analyzed the taxonomic origin of the proteins responsible of the prediction of the detected genes. Organism Escherichia coli O26:H11 (strain 11368 / EHEC) Escherichia coli (strain 55989 / EAEC) Escherichia coli O44:H18 (strain 042 / EAEC) Escherichia coli O103:H2 (strain 12009 / EHEC) Escherichia coli Escherichia coli O111:H- (strain 11128 / EHEC) Escherichia coli O157:H7 (strain EC4115 / EHEC) Escherichia coli O157:H7 (strain TW14359 / EHEC) Escherichia coli (strain K12) Salmonella typhi Escherichia coli O1:K1 / APEC Escherichia coli (strain UTI89 / UPEC) Escherichia coli O81 (strain ED1a) Yersinia pestis Escherichia coli O139:H28 (strain E24377A / ETEC) Escherichia coli B354 Escherichia coli O55:H7 (strain CB9615 / EPEC)

Yersinia pestis Pestoides A
Putative peptidase
Putative uncharacterized protein
Predicted inner membrane peptidase
Gen ID start end S tags
Macrolide resistance
Multidrug resistance protein mdtL
Escherichia coli
Multidrug resistance protein D
Antibiotic resistance related?
Multidrug resistance efflux transporter MdtE
Predicted methyl viologen efflux pump
Putative adhesin
Adhesin YfaL
Flagellar filament capping protein
Flagellar motor switching and energizing component protein FliG
Surface presentation of antigens Escherichia coli protein
Type III secretion system lipoprotein EprK
Putative type VI secretion protein
Type III secretion protein EprJ
Putative fimbrial protein
Outer membrane usher protein
Outer membrane usher protein AggC Putative fimbrial protein
Mercuric ion transport protein
Transposase Predicted transposase
Putative uncharacterized protein yncI
Truncated transposase Putative transposase
Escherichia coli Salmonella typhi
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call