Abstract
Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.
Highlights
Conserved short sequences help identify functional genomic regions and facilitate genomic annotation
The tetranucleotide sequence CTAG is remarkably under-abundant in E. coli and Salmonella as previously evidenced by the relatively small numbers of endonuclease cleavage sites containing CTAG such as XbaI (TCTAGA), BlnI or AvrII (CCTAGG), and SpeI (ACTAGT)[5,15,20], a phenomenon of biased codon usage intensively studied in E. coli[21,22]
S. typhimurium LT2 and representative strains of S. typhi, S. paratyphi A, B, C, and S. gallinarum in comparison with E. coli K12, we found that the majority of the combinations have numbers greater than twelve thousand in all
Summary
General strategies of computer modeling to predict secondary structures of short sequences. C code library and several stand-alone programs, including RNAfold This program reads RNA sequences from standard inputstdin, calculates their minimum free energy (mfe) structure and prints to standard output. CONTRAfold uses probabilistic parameters learned from a set of RNA secondary structures to predict base-pair probabilities and predicts structures using the maximum P (i, j) expected accuracy approach. RNA Package (www.tbi.univie.ac.at/RNA/), with the former for predicting pair probabilities within the equilibrium ensemble and the latter for producing a diagram of the predicted structure containing information about probability. The Perl script relplot.pl adds reliability information to a RNA secondary structure plot and computes a well-definedness measure, which we call “positional entropy” (Fig. 1). In the case of the inter-lpp-pykF sequence, we used the sequence as query in the search against the database (https://www.ncbi.nlm.nih.gov/genome) that contains all published genomes of Enterobacteriaceae family
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.