Abstract

This work reconsiders the GATC motif distribution in a 1.6 Mb segment of the Escherichia coligenome, compared to its distribution in phages and plasmids. At first sight the distribution of GATC words looks random. But when a realistic model of the chromosome (made of average genes having the same codon usage as in the real chromosome), is used as a theoretical reference, strong biases are observed. GATC pairs such as GATCNNGATC are under-represented while there is a strong positive selection for motifs separated by 10, 19, 70 and 1100 bp. The last class is the only one present in E. coliparasites. It can be ascribed to the triggering sequences of the long-patch mismatch repair system. The 6 bp class overlaps with the consensus of CAP (catabolite activator protein) and FNR (fumarate/nitrate regulator) binding sites, thus accounting for counter-selection. The other classes, which could be targets for a nucleic acid binding protein, are almost always present inside protein coding sequences, and are members of clusters of GATC motifs. Analysis of the genes containing these motifs suggests that they correspond to a regulatory process monitoring the shift from anaerobic to aerobic growth conditions. In particular this regulation, closing down transcription of a large number of genes involved in intermediary metabolism would be well suited for the cold and oxygen shift from the mammal's gut to the standard environmental conditions. In this process the methylation status of GATC clusters would be very important for tuning transcription, and a DNA binding protein, probably a member of the cold-shock proteins family would be needed for alleviating the effects mediated by slackening of the pace of methylation during the shift.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call