Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

Le Tang,Songling Zhu,Shu-Lin Liu,Gui-Rong Liu,Randal N Johnston,Yu-Jie Zhou,Yong-Guo Li,Emilio Mastriani,Xin Fang,Zheng Guo

doi:10.1038/srep43565

Abstract

Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

Highlights

Conserved short sequences help identify functional genomic regions and facilitate genomic annotation
The tetranucleotide sequence CTAG is remarkably under-abundant in E. coli and Salmonella as previously evidenced by the relatively small numbers of endonuclease cleavage sites containing CTAG such as XbaI (TCTAGA), BlnI or AvrII (CCTAGG), and SpeI (ACTAGT)[5,15,20], a phenomenon of biased codon usage intensively studied in E. coli[21,22]
S. typhimurium LT2 and representative strains of S. typhi, S. paratyphi A, B, C, and S. gallinarum in comparison with E. coli K12, we found that the majority of the combinations have numbers greater than twelve thousand in all

Summary

Methods

General strategies of computer modeling to predict secondary structures of short sequences. C code library and several stand-alone programs, including RNAfold This program reads RNA sequences from standard inputstdin, calculates their minimum free energy (mfe) structure and prints to standard output. CONTRAfold uses probabilistic parameters learned from a set of RNA secondary structures to predict base-pair probabilities and predicts structures using the maximum P (i, j) expected accuracy approach. RNA Package (www.tbi.univie.ac.at/RNA/), with the former for predicting pair probabilities within the equilibrium ensemble and the latter for producing a diagram of the predicted structure containing information about probability. The Perl script relplot.pl adds reliability information to a RNA secondary structure plot and computes a well-definedness measure, which we call “positional entropy” (Fig. 1). In the case of the inter-lpp-pykF sequence, we used the sequence as query in the search against the database (https://www.ncbi.nlm.nih.gov/genome) that contains all published genomes of Enterobacteriaceae family

Results

Morganella morganii KT

CTAG CTGG CTAG gltU rfe gltV fdhF

RNase II

Number of calculated CTAG

Author Contributions

Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Mar 6, 2017
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

A correlation procedure for augmenting hydrologic data
N.C Matalas ... Barbara Jacobs
-
N.C Matalas, et. al.N.C Matalas ... Barbara Jacobs
01 Jan 1964
01 Jan 1964

Finding Keywords for Intergenic and Gene Regions for Human Genome
Y H Qiao ... Yanjun Zeng
Nucleosides, Nucleotides & Nucleic Acids | VOL. 24
Y H Qiao, et. al.Y H Qiao ... Yanjun Zeng
01 Mar 2005
Nucleosides, Nucleotides & Nucleic Acids | VOL. 24

Nucleosomes shape DNA polymorphism and divergence.
Sasha A Langley ... Gary H Karpen
PLoS Genetics | VOL. 10
Sasha A Langley, et. al.Sasha A Langley ... Gary H Karpen
03 Jul 2014
PLoS Genetics | VOL. 10

5′ and 3′ ends of chloroplast transcripts can both be stabilised by protein ‘caps’: a new model for polycistronic RNA maturation
Yves Choquet
The EMBO Journal | VOL. 28
Yves ChoquetYves Choquet
22 Jul 2009
The EMBO Journal | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports