Escherichia coli, a venerable workhorse for biochemical and genetic studies and for the large-scale production of recombinant proteins, is one of the most intensively studied of all organisms. The natural habitat of E. coli is the gastrointestinal tract of warm-blooded animals, and in humans, this species is the most common facultative anaerobe in the gut. Although most strains exist as harmless symbionts, there are many pathogenic E. coli strains that can cause a variety of diseases in animals and humans. In addition, from an evolutionary perspective, strains of the genus Shigella are so closely related phylogenetically that they are included in the group of organisms recognized as E. coli (1, 2). Pathogenic E. coli strains differ from those that predominate in the enteric flora of healthy individuals in that they are more likely to express virulence factors — molecules directly involved in pathogenesis but ancillary to normal metabolic functions. Expression of these virulence factors disrupts the normal host physiology and elicits disease. In addition to their role in disease processes, virulence factors presumably enable the pathogens to exploit their hosts in ways unavailable to commensal strains, and thus to spread and persist in the bacterial community. It is a mistake to think of E. coli as a homogenous species. Most genes, even those encoding conserved metabolic functions, are polymorphic, with multiple alleles found among different isolates (1). The composition of the genome of E. coli is also highly dynamic. The fully sequenced genome of the laboratory K-12 strain, whose derivatives have served an indispensable role in the laboratories of countless scientists, shows evidence of tremendous plasticity (3). It has been estimated that the K-12 lineage has experienced more than 200 lateral transfer events since it diverged from Salmonella about 100 million years ago and that 18% of its contemporary genes were obtained horizontally from other species (4). Such fluid gain and loss of genetic material are also seen in the recent comparison of the genomic sequence of a pathogenic E. coli O157:H7 with the K-12 genome. Approximately 4.1 million base pairs of “backbone” sequences are conserved between the genomes, but these stretches are punctuated by hundreds of sequences present in one strain but not in the other. The pathogenic strain contains 1.34 million base pairs of lineage-specific DNA that includes 1,387 new genes; some of these have been implicated in virulence, but many have no known function (5). The virulence factors that distinguish the various E. coli pathotypes were acquired from numerous sources, including plasmids, bacteriophages, and the genomes of other bacteria. Pathogenicity islands, relatively large (>10 kb) genetic elements that encode virulence factors and are found specifically in the genomes of pathogenic strains, frequently have base compositions that differ drastically from that of the content of the rest of the E. coli genome, indicating that they were acquired from another species. Here, we explore some of the known virulence factors that contribute to the heterogeneity of E. coli strains, and we review what is known regarding the origin and distribution of these factors.
Read full abstract