Abstract

Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.

Highlights

  • Organization of genes in a genome is less modular than the typical portrayal of a linear series of discrete regulatory and coding regions; the density of encoded information is amplified as neighboring genes can share common nucleotides, arrangements that may have been selected because of the benefits of compressing genetic information, or because of a regulatory relationship between the overlapping sequences [1,2,3]

  • In order to identify proteins specified by non-annotated genes, we analyzed peptide data using a stop to stop database based on the Pf0-1 genome sequence

  • Products encoded by the remaining open reading frames (ORFs) may have escaped our detection because of various technical limitations associated with our experimental strategy: their low levels of expression, incompatible buffers for extractions, absence of secreted proteins, and incorrect annotations

Read more

Summary

Introduction

Organization of genes in a genome is less modular than the typical portrayal of a linear series of discrete regulatory and coding regions; the density of encoded information is amplified as neighboring genes can share common nucleotides, arrangements that may have been selected because of the benefits of compressing genetic information, or because of a regulatory relationship between the overlapping sequences [1,2,3]. A combination of gene modeling programs has been utilized for annotation, such as Generation, Glimmer, Critica, and more recently, Prodigal Only short overlaps between the ends of coding sequences are considered during both computational and manual annotation efforts. While the algorithms of the gene models continue to be developed, their intrinsic dependence on history pushes the bias towards previous knowledge, potentially propagating errors and omissions in subsequent annotations. Genome sequence annotations seldom include overlapping genes where one member of the pair is fully embedded within the coding sequence of the other. Continued development and application of post-annotation quality control programs like MisPred [4] should help to reduce the frequency of errors and their subsequent transmission, as will experimental verification of novel gene arrangements [e.g. 5]

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.