Abstract

BackgroundMultilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical allelic profile, which is abbreviated to a unique identifier: the sequence type (ST). The relatedness between two strains can then be inferred by the differences between allelic profiles. For a more comprehensive analysis of the possible patterns of evolutionary descent, a set of rules were proposed and implemented in the eBURST algorithm. These rules allow the division of a data set into several clusters of related strains, dubbed clonal complexes, by implementing a simple model of clonal expansion and diversification. Within each clonal complex, the rules identify which links between STs correspond to the most probable pattern of descent. However, the eBURST algorithm is not globally optimized, which can result in links, within the clonal complexes, that violate the rules proposed.ResultsHere, we present a globally optimized implementation of the eBURST algorithm – goeBURST. The search for a global optimal solution led to the formalization of the problem as a graphic matroid, for which greedy algorithms that provide an optimal solution exist. Several public data sets of MLST data were tested and differences between the two implementations were found and are discussed for five bacterial species: Enterococcus faecium, Streptococcus pneumoniae, Burkholderia pseudomallei, Campylobacter jejuni and Neisseria spp.. A novel feature implemented in goeBURST is the representation of the level of tiebreak rule reached before deciding if a link should be drawn, which can used to visually evaluate the reliability of the represented hypothetical pattern of descent.ConclusiongoeBURST is a globally optimized implementation of the eBURST algorithm, that identifies alternative patterns of descent for several bacterial species. Furthermore, the algorithm can be applied to any multilocus typing data based on the number of differences between numeric profiles. A software implementation is available at .

Highlights

  • Multilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species

  • A recent simulation study showed that the eBURST definition of clonal complexes and the inferred pattern of descent within them is reliable in conditions comparable to those of the majority of natural bacterial populations of many different species while uncovering conditions when eBURST performance is suboptimal [37]

  • If one considers only the allelic profile to derive an Minimum Spanning Tree (MST) connecting all sequence type (ST), multiple optimal solutions exist due to the limited and discrete space of ST differences. eBURST is similar to finding a MST on the entire data set but restricting the links only to those between single locus variants (SLVs) and selecting the trees with the highest quality as defined by a set of rules

Read more

Summary

Methodology article

Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach. Address: 1Instituto de Engenharia de Sistemas e Computadores – ID em Lisboa, Rua Alves Redol 9, 1000-029 Lisboa, Portugal, 2Instituto Superior Técnico, Universidade Técnica de Lisboa, Av. Rovisco Pais, 1049-001 Lisboa, Portugal and 3Instituto de Microbiologia/Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Published: 18 May 2009 BMC Bioinformatics 2009, 10:152 doi:10.1186/1471-2105-10-152

Results
Background
Results and discussion
Conclusion
38. Achtman M
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call