The HAD (haloacid dehalogenase) superfamily includes phosphoesterases, ATPases, phosphonatases, dehalogenases, and sugar phosphomutases acting on a remarkably diverse set of substrates. The availability of numerous crystal structures of representatives belonging to diverse branches of the HAD superfamily provides us with a unique opportunity to reconstruct their evolutionary history and uncover the principal determinants that led to their diversification of structure and function. To this end we present a comprehensive analysis of the HAD superfamily that identifies their unique structural features and provides a detailed classification of the entire superfamily. We show that at the highest level the HAD superfamily is unified with several other superfamilies, namely the DHH, receiver (CheY-like), von Willebrand A, TOPRIM, classical histone deacetylases and PIN/FLAP nuclease domains, all of which contain a specific form of the Rossmannoid fold. These Rossmannoid folds are distinguished from others by the presence of equivalently placed acidic catalytic residues, including one at the end of the first core β-strand of the central sheet. The HAD domain is distinguished from these related Rossmannoid folds by two key structural signatures, a “squiggle” (a single helical turn) and a “flap” (a beta hairpin motif) located immediately downstream of the first β-strand of their core Rossmanoid fold. The squiggle and the flap motifs are predicted to provide the necessary mobility to these enzymes for them to alternate between the “open” and “closed” conformations. In addition, most members of the HAD superfamily contains inserts, termed caps, occurring at either of two positions in the core Rossmannoid fold. We show that the cap modules have been independently inserted into these two stereotypic positions on multiple occasions in evolution and display extensive evolutionary diversification independent of the core catalytic domain. The first group of caps, the C1 caps, is directly inserted into the flap motif and regulates access of reactants to the active site. The second group, the C2 caps, forms a roof over the active site, and access to their internal cavities might be in part regulated by the movement of the flap. The diversification of the cap module was a major factor in the exploration of a vast substrate space in the course of the evolution of this superfamily. We show that the HAD superfamily contains 33 major families distributed across the three superkingdoms of life. Analysis of the phyletic patterns suggests that at least five distinct HAD proteins are traceable to the last universal common ancestor (LUCA) of all extant organisms. While these prototypes diverged prior to the emergence of the LUCA, the major diversification in terms of both substrate specificity and reaction types occurred after the radiation of the three superkingdoms of life, primarily in bacteria. Most major diversification events appear to correlate with the acquisition of new metabolic capabilities, especially related to the elaboration of carbohydrate metabolism in the bacteria. The newly identified relationships and functional predictions provided here are likely to aid the future exploration of the numerous poorly understood members of this large superfamily of enzymes.
Read full abstract